← all datasets

GPQA-Diamond

Emerging
23papers using it
6,511HF downloads
9HF likes
2025first seen

The 'GPQA-Diamond' dataset/benchmark contains reasoning tasks used to evaluate the performance of quantized Large Reasoning Models (LRMs) during fine-tuning.

Papers using GPQA-Diamond (23)

GPQA-Diamond β€” datasets β€” llm-papers