MUSR musr Leaderboard
Multistep Soft Reasoning - long-form reasoning over fictional narratives. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | CultriX/Qwen2.5-14B-ReasoningMerge | 25.61 | β |
| 2 | DavidAU/DeepSeek-R1-Distill-Qwen-25.5B-Brainstorm | 25.24 | β |
| 3 | Danielbrdz/Barcenas-14b-phi-4 | 24.24 | β |
| 4 | CultriX/Qwen2.5-14B-Ultimav2 | 22.04 | β |
| 5 | 1024m/PHI-4-Hindi | 21.52 | β |
| 6 | Daemontatox/Llama3.3-70B-CogniLink | 21.40 | β |
| 7 | CultriX/Qwen2.5-14B-Hyper | 21.03 | β |
| 8 | Daemontatox/PathfinderAI | 20.83 | β |
| 9 | Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO | 20.53 | β |
| 10 | CohereForAI/c4ai-command-r-plus | 20.42 | β |
| 11 | CultriX/SeQwence-14B-EvolMerge | 20.26 | β |
| 12 | AbacusResearch/Jallabi-34B | 20.24 | β |
| 13 | CultriX/Qwen2.5-14B-Brocav7 | 20.15 | β |
| 14 | Daemontatox/PathFinderAi3.0 | 20.05 | β |
| 15 | AI-Sweden-Models/Llama-3-8B-instruct | 19.94 | β |
| 16 | Daemontatox/CogitoZ | 19.94 | β |
| 17 | CultriX/Qwen2.5-14B-Hyperionv5 | 19.88 | β |
| 18 | CultriX/Qwen2.5-14B-Hyperionv4 | 19.87 | β |
| 19 | CohereForAI/c4ai-command-r-plus-08-2024 | 19.84 | β |
| 20 | EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 | 19.73 | β |
| 21 | CultriX/Qwen2.5-14B-partialmergept1 | 19.66 | β |
| 22 | EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2 | 19.63 | β |
| 23 | Cran-May/merge_model_20250308_2 | 19.52 | β |
| 24 | Aashraf995/QwenStock-14B | 19.27 | β |
| 25 | CultriX/Qwen2.5-14B-Brocav3 | 19.25 | β |
| 26 | Aryanne/QwentileSwap | 19.21 | β |
| 27 | CultriX/Qwen2.5-14B-Broca | 18.95 | β |
| 28 | CultriX/Qwen2.5-14B-Hyperionv3 | 18.92 | β |
| 29 | CultriX/Qwen2.5-14B-Brocav6 | 18.88 | β |
| 30 | CultriX/SeQwence-14Bv1 | 18.80 | β |