SWE-bench Multimodal swe-bench-multimodal Leaderboard
SWE-bench Multimodal β issues drawn from JavaScript front-end repositories where the task includes images (screenshots, design mocks), so resolving the bug requires reasoning over both code and visual context. Score is the % of issues resolved. Β· Metric: % Resolved (higher is better)
| # | Model | % Resolved | Paper |
|---|---|---|---|
| 1 | Codefuse_Pycfuse_SVR | 35.98 | link |
| 2 | GUIRepair + o3 (2025-04-16) | 35.98 | link |
| 3 | Refact.ai Agent | 35.59 | link |
| 4 | OpenHands-Versa (Claude-Sonnet 4) | 34.43 | link |
| 5 | GUIRepair + o4-mini (2025-04-16) | 33.85 | link |
| 6 | OpenHands-Versa (Claude-3.7 Sonnet) | 31.33 | link |
| 7 | GUIRepair + GPT 4.1 (2025-04-14) | 31.14 | link |
| 8 | Zencoder (2025-04-01) | 30.56 | link |
| 9 | GUIRepair + GPT 4o (2024-08-06) | 30.37 | link |
| 10 | Globant Code Fixer Agent | 29.59 | link |
| 11 | Zencoder (2025-03-10) | 27.08 | link |
| 12 | Agentless Lite + Claude-3.5 Sonnet | 25.34 | link |
| 13 | SWE-agent + Claude Sonnet 3.5 | 12.19 | link |
| 14 | SWE-agent Multimodal + GPT 4o (2024-08-06) | 12.19 | link |
| 15 | SWE-agent + GPT 4o (2024-08-06) | 11.99 | link |
| 16 | SWE-agent JavaScript + Claude Sonnet 3.5 | 11.99 | link |
| 17 | SWE-agent Multimodal + Claude 3.5 Sonnet | 11.41 | link |
| 18 | SWE-agent JavaScript + GPT 4o (2024-08-06) | 9.28 | link |
| 19 | Agentless + Claude 3.5 Sonnet | 6.19 | link |
| 20 | RAG + GPT 4o (2024-08-06) | 6.00 | link |
| 21 | RAG + Claude 3.5 Sonnet | 5.03 | link |
| 22 | Agentless + GPT 4o (2024-08-06) | 3.09 | link |