← all datasets

GPQA-Diamond

Emerging

4papers using it

2025first seen

The 'GPQA Diamond' is a benchmark dataset used to evaluate the performance of reasoning models in various reasoning tasks.

🔎 Find this dataset

Papers using GPQA-Diamond (4)

OpenThoughts: Data Recipes for Reasoning Models2025 · 1 cites

QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals2026

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference2026

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model2025

GPQA-Diamond — datasets — ai-for-code