← all datasets

GPQA-Diamond

Emerging
3papers using it
2025first seen

The 'GPQA-Diamond' dataset/benchmark contains multi-agent debate scenarios used to evaluate the mechanisms of stance convergence and conformity in large language models (LLMs).

Papers using GPQA-Diamond (3)

GPQA-Diamond β€” datasets β€” ai-agents