MMLU-PRO mmlu-pro Leaderboard
Successor to MMLU - 12K questions, harder distractors, less contamination. 5-shot accuracy. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | Aryanne/QwentileSwap | 54.95 | β |
| 2 | Baptiste-HUVELLE-10/LeTriomphant2.2_ECE_iLAB | 53.90 | β |
| 3 | EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 | 53.48 | β |
| 4 | Daemontatox/PathFinderAi3.0 | 52.86 | β |
| 5 | CombinHorizon/huihui-ai-abliterated-Qwen2.5-32B-Inst-BaseMerge-TIES | 52.45 | β |
| 6 | CombinHorizon/zetasepic-abliteratedV2-Qwen2.5-32B-Inst-BaseMerge-TIES | 52.05 | β |
| 7 | BenevolenceMessiah/Qwen2.5-72B-2x-Instruct-TIES-v1.0 | 51.43 | β |
| 8 | Daemontatox/CogitoZ | 51.03 | β |
| 9 | Daemontatox/PathFinderAI2.0 | 50.52 | β |
| 10 | Daemontatox/PathfinderAI | 50.47 | β |
| 11 | DoppelReflEx/MiniusLight-24B-v1d-test | 49.87 | β |
| 12 | DoppelReflEx/MiniusLight-24B-v1c-test | 49.86 | β |
| 13 | CultriX/Qwen2.5-14B-Wernicke | 49.15 | β |
| 14 | CultriX/Qwestion-14B | 49.14 | β |
| 15 | CultriX/Qwen2.5-14B-MegaMerge-pt2 | 49.12 | β |
| 16 | Cran-May/merge_model_20250308_2 | 49.11 | β |
| 17 | CultriX/SeQwence-14B | 49.10 | β |
| 18 | CultriX/SeQwence-14B-EvolMerge | 49.10 | β |
| 19 | CultriX/Qwen2.5-14B-Ultimav2 | 49.08 | β |
| 20 | CultriX/SeQwence-14B-v5 | 49.05 | β |
| 21 | CultriX/Qwen2.5-14B-MergeStock | 48.84 | β |
| 22 | CultriX/SeQwence-14B-EvolMergev1 | 48.81 | β |
| 23 | Aashraf995/QwenStock-14B | 48.69 | β |
| 24 | CultriX/Qwen2.5-14B-Hyper | 48.60 | β |
| 25 | Cran-May/merge_model_20250308_4 | 48.52 | β |
| 26 | DoppelReflEx/MiniusLight-24B-v1b-test | 48.50 | β |
| 27 | CultriX/Qwen2.5-14B-Broca | 48.49 | β |
| 28 | CultriX/Qwen2.5-14B-Hyperionv4 | 48.49 | β |
| 29 | CultriX/Qwen2.5-14B-ReasoningMerge | 48.28 | β |