#Model% ResolvedPaper
1Gemini 3 Flash72.70link
2Claude 4.6 Opus72.00link
3Claude 4.5 Opus70.70link
4GLM-569.70link
5Gemini 3 Pro68.70link
6Minimax 2.568.30link
7Kimi K2.567.30link
8Claude 4.5 Sonnet67.00link
9GPT-5.2 (high reasoning)66.70link
10GPT 5.2 Codex66.30link
11GPT-5-2 Codex66.30link
12Claude 4.5 Haiku64.70link
13DeepSeek V3.259.00link
14GPT-5 mini39.70link
SWE-bench Multilingual swe-bench-multilingual Leaderboard