GPQA-Diamond
Emerging23papers using it
6,511HF downloads
9HF likes
2025first seen
The 'GPQA-Diamond' dataset/benchmark contains reasoning tasks used to evaluate the performance of quantized Large Reasoning Models (LRMs) during fine-tuning.
Papers using GPQA-Diamond (23)
- Trust but Verify: Prover-Verifier Deliberation for Selective LLM PredictionSelf-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data PipelineTransformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language ModelsQuantLRM: Quantization of Large Reasoning Models via Fine-Tuning SignalsImproving Data and Reward Design for Scientific Reasoning in Large Language ModelsDarwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model ReasoningPRISM: Demystifying Retention and Interaction in Mid-TrainingLie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language ModelsAsking LLMs to Verify First is Almost Free LunchPrompting Test-Time Scaling Is A Strong LLM Reasoning Data AugmentationThink Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
ThinkingDianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language ModelsProcess Reward Models That ThinkPrior Prompt Engineering for Reinforcement Fine-TuningVoid in Language ModelsFirst Finish Search: Efficient Test-Time Scaling in Large Language
ModelsLIMOPro: Reasoning Refinement for Efficient and Effective Test-time
ScalingRing-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning
for LLMsReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMsAnswer Matching Outperforms Multiple Choice for Language Model
EvaluationAgentar-Fin-R1: Enhancing Financial Intelligence through Domain
Expertise, Training Efficiency, and Advanced ReasoningMeta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning