GPQA-Diamond
Emerging3papers using it
2025first seen
The 'GPQA-Diamond' dataset/benchmark contains multi-agent debate scenarios used to evaluate the mechanisms of stance convergence and conformity in large language models (LLMs).
The 'GPQA-Diamond' dataset/benchmark contains multi-agent debate scenarios used to evaluate the mechanisms of stance convergence and conformity in large language models (LLMs).