← all datasets

GPQA-Diamond

Emerging
4papers using it
2025first seen

The 'GPQA Diamond' is a benchmark dataset used to evaluate the performance of reasoning models in various reasoning tasks.

Papers using GPQA-Diamond (4)

GPQA-Diamond β€” datasets β€” ai-for-code