MMMU (Validation) mmmu-val Leaderboard
Massive Multi-discipline Multimodal Understanding β expert-level perception and reasoning across 30 college-level subjects (art, science, medicine, engineering). Validation split accuracy as scored by the OpenVLM Leaderboard. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | GPT-5-20250807 | 81.80 | link |
| 2 | GPT-5-mini-20250807 | 78.70 | link |
| 3 | SenseNova-V6-5-Pro | 77.00 | link |
| 4 | CongRong-v2.0 | 75.60 | link |
| 5 | Gemini-2.5-Pro | 74.70 | link |
| 6 | GPT-4.1-20250414 | 74.00 | link |
| 7 | ChatGPT-4o-latest | 72.90 | link |
| 8 | Gemini-2.0-Pro | 72.60 | link |
| 9 | GPT-5-nano-20250807 | 72.60 | link |
| 10 | InternVL3-78B | 72.20 | link |
| 11 | GPT-4.5 | 72.10 | link |
| 12 | Claude3.7-Sonnet | 71.00 | link |
| 13 | GPT-4o (1120, detail-high) | 70.70 | link |
| 14 | HunYuan-Standard-Vision | 70.70 | link |
| 15 | SenseNova-V6-Pro | 70.40 | link |
| 16 | InternVL2.5-78B | 70.00 | link |
| 17 | Gemini-2.0-Flash | 69.90 | link |
| 18 | GLM-4v-Plus-20250111 | 69.90 | link |
| 19 | GPT-4o (0806, detail-high) | 69.90 | link |
| 20 | Step-1o | 69.90 | link |
| 21 | InternVL3-38B | 69.70 | link |
| 22 | SenseNova | 69.60 | link |
| 23 | GPT-4o (0513, detail-high) | 69.20 | link |
| 24 | Qwen2.5-VL-32B | 68.90 | link |
| 25 | JT-VL-Chat-V3.0 | 68.70 | link |
| 26 | Gemini-1.5-Pro-002 | 68.60 | link |
| 27 | InternVL2.5-78B-MPO | 68.20 | link |
| 28 | Qwen2.5-VL-72B | 68.20 | link |
| 29 | grok-2-vision-1212 | 67.10 | link |
| 30 | Ovis2-34B | 66.70 | link |