AI2D ai2d Leaderboard
AI2 Diagrams β multiple-choice question answering over grade-school science diagrams (parts, relationships, processes). Overall accuracy as scored by OpenCompass VLMEvalKit. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | HunYuan-Standard-Vision | 93.20 | link |
| 2 | Taiyi | 90.80 | link |
| 3 | SenseNova-V6-5-Pro | 90.20 | link |
| 4 | CongRong-v2.0 | 90.00 | link |
| 5 | InternVL3-78B | 89.80 | link |
| 6 | Gemini-2.5-Pro | 89.50 | link |
| 7 | GPT-5-20250807 | 89.50 | link |
| 8 | InternVL2.5-78B-MPO | 89.20 | link |
| 9 | SenseNova-V6-Pro | 89.20 | link |
| 10 | InternVL2.5-78B | 89.10 | link |
| 11 | Step-1o | 89.10 | link |
| 12 | MUG-U-7B | 88.90 | link |
| 13 | InternVL3-38B | 88.70 | link |
| 14 | Qwen2.5-VL-72B | 88.50 | link |
| 15 | TeleMM | 88.50 | link |
| 16 | JT-VL-Chat-V3.0 | 88.30 | link |
| 17 | Ovis2-34B | 88.30 | link |
| 18 | Qwen2-VL-72B | 88.30 | link |
| 19 | Qwen-VL-Max-0809 | 88.10 | link |
| 20 | InternVL2.5-38B-MPO | 87.90 | link |
| 21 | SenseNova | 87.80 | link |
| 22 | InternVL2.5-38B | 87.60 | link |
| 23 | InternVL2-Llama3-76B | 87.60 | link |
| 24 | SAIL-VL-1.5-8B | 87.50 | link |
| 25 | SAIL-VL-1.6-8B | 87.50 | link |
| 26 | Step-1.5V | 87.50 | link |
| 27 | BailingMM-Pro-0120 | 87.20 | link |
| 28 | GPT-4.5 | 87.20 | link |
| 29 | InternVL2-40B | 86.80 | link |
| 30 | GLM-4v-Plus-20250111 | 86.70 | link |