RealWorldQA realworldqa Leaderboard
Real-world spatial-understanding benchmark for multimodal models β photographs (including driving scenes) probing whether a model grasps physical layout, orientation and everyday visual common sense. Accuracy as scored by the OpenVLM Leaderboard. Β· Metric: Accuracy (higher is better)
| # | Model | Accuracy | Paper |
|---|---|---|---|
| 1 | GPT-4.1-20250414 | 78.70 | link |
| 2 | InternVL3-78B | 78.40 | link |
| 3 | GPT-4.1-mini-20250414 | 77.50 | link |
| 4 | InternVL2.5-78B-MPO | 77.40 | link |
| 5 | Qwen2-VL-72B | 76.70 | link |
| 6 | GPT-4o (0806, detail-high) | 76.50 | link |
| 7 | InternVL3-38B | 75.80 | link |
| 8 | Ovis2-34B | 75.60 | link |
| 9 | GPT-4o (0513, detail-high) | 75.40 | link |
| 10 | Qwen2.5-VL-72B | 75.30 | link |
| 11 | Gemini-2.0-Pro | 74.80 | link |
| 12 | InternVL2.5-38B-MPO | 74.40 | link |
| 13 | Qwen-VL-Max-0809 | 74.20 | link |
| 14 | Ovis2-16B | 74.10 | link |
| 15 | Step-1o | 74.10 | link |
| 16 | LLaVA-OneVision-72B | 73.90 | link |
| 17 | InternVL2.5-26B-MPO | 73.70 | link |
| 18 | LLaVA-OneVision-72B (SI) | 73.70 | link |
| 19 | Molmo-72B | 73.70 | link |
| 20 | Taiyi | 73.10 | link |
| 21 | InternVL2-Llama3-76B | 72.70 | link |
| 22 | Ovis1.6-Gemma2-27B | 72.70 | link |
| 23 | Ovis2-8B | 72.50 | link |
| 24 | VARCO-VISION-14B | 72.50 | link |
| 25 | Gemini-2.0-Flash | 72.30 | link |
| 26 | GLM-4v-Plus-20250111 | 72.20 | link |
| 27 | MUG-U-7B | 71.60 | link |
| 28 | InternVL3-8B | 71.40 | link |
| 29 | InternVL2.5-8B-MPO | 71.10 | link |
| 30 | Ovis2-4B | 71.10 | link |