GPQA Diamond gpqa Leaderboard
Graduate-level Google-Proof Q&A across biology, physics, chemistry. Hard PhD-grade questions. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | Daemontatox/Llama3.3-70B-CogniLink | 26.06 | β |
| 2 | Daemontatox/PathFinderAi3.0 | 21.14 | β |
| 3 | EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 | 21.14 | β |
| 4 | CultriX/Qwen2.5-14B-ReasoningMerge | 21.03 | β |
| 5 | Baptiste-HUVELLE-10/LeTriomphant2.2_ECE_iLAB | 19.91 | β |
| 6 | 1024m/PHI-4-Hindi | 19.69 | β |
| 7 | Cran-May/merge_model_20250308_4 | 19.69 | β |
| 8 | CultriX/Qwen2.5-14B-Hyperionv4 | 19.69 | β |
| 9 | Daemontatox/CogitoZ | 19.35 | β |
| 10 | DoppelReflEx/MiniusLight-24B-v1c-test | 19.35 | β |
| 11 | DoppelReflEx/MiniusLight-24B-v1d-test | 19.35 | β |
| 12 | Daemontatox/PathfinderAI | 19.24 | β |
| 13 | EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 | 19.24 | β |
| 14 | CultriX/Qwen2.5-14B-Wernicke | 19.13 | β |
| 15 | CultriX/Qwen2.5-14B-Hyper | 18.90 | β |
| 16 | Cran-May/merge_model_20250308_2 | 18.79 | β |
| 17 | Aashraf995/QwenStock-14B | 18.57 | β |
| 18 | CultriX/Qwen2.5-14B-Broca | 18.23 | β |
| 19 | DavidAU/DeepSeek-R1-Distill-Qwen-25.5B-Brainstorm | 18.12 | β |
| 20 | CultriX/Qwen2.5-14B-Ultimav2 | 18.01 | β |
| 21 | Danielbrdz/Barcenas-14b-phi-4 | 17.79 | β |
| 22 | CultriX/SeQwence-14B-EvolMerge | 17.45 | β |
| 23 | 01-ai/Yi-1.5-9B | 17.23 | β |
| 24 | CultriX/Qwen2.5-14B-MegaMerge-pt2 | 17.23 | β |
| 25 | DoppelReflEx/MiniusLight-24B-v1b-test | 17.23 | β |
| 26 | Danielbrdz/Barcenas-14b-phi-4-v2 | 17.11 | β |
| 27 | CultriX/SeQwence-14B-EvolMergev1 | 16.89 | β |
| 28 | BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B | 16.78 | β |
| 29 | CultriX/Qwen2.5-14B-MergeStock | 16.44 | β |
| 30 | CultriX/Qwen2.5-14B-Hyperionv5 | 16.22 | β |