MMStar mmstar Leaderboard
Vision-indispensable multimodal benchmark β 1500 samples hand-filtered so every question genuinely requires the image (removing the text-only-solvable and data-leakage cases). Overall accuracy. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | JT-VL-Chat-V3.0 | 82.10 | link |
| 2 | BlueLM-2.6-3B | 80.10 | link |
| 3 | SenseNova-V6-5-Pro | 76.10 | link |
| 4 | GPT-5-20250807 | 75.70 | link |
| 5 | CongRong-v2.0 | 75.30 | link |
| 6 | SenseNova-V6-Pro | 73.70 | link |
| 7 | Gemini-2.5-Pro | 73.60 | link |
| 8 | InternVL3-78B | 73.40 | link |
| 9 | GPT-5-mini-20250807 | 73.20 | link |
| 10 | HunYuan-Standard-Vision | 72.90 | link |
| 11 | SenseNova | 72.70 | link |
| 12 | InternVL3-38B | 72.60 | link |
| 13 | R-4B | 72.60 | link |
| 14 | GLM-4v-Plus-20250111 | 72.50 | link |
| 15 | InternVL2.5-78B-MPO | 72.10 | link |
| 16 | Ola-7b | 70.80 | link |
| 17 | TeleMM | 70.80 | link |
| 18 | Qwen2.5-VL-72B | 70.50 | link |
| 19 | Qwen2.5-VL-32B | 70.30 | link |
| 20 | ChatGPT-4o-latest | 70.20 | link |
| 21 | InternVL2.5-38B-MPO | 70.10 | link |
| 22 | Kimi-VL-A3B-Thinking-2506 | 70.00 | link |
| 23 | GPT-4.1-20250414 | 69.80 | link |
| 24 | SAIL-VL-1.5-8B | 69.70 | link |
| 25 | InternVL2.5-78B | 69.50 | link |
| 26 | SAIL-VL-1.6-8B | 69.50 | link |
| 27 | BailingMM-Pro-0120 | 69.40 | link |
| 28 | Gemini-2.0-Flash | 69.40 | link |
| 29 | GPT-4.5 | 69.30 | link |
| 30 | Step-1o | 69.30 | link |