#Model% ResolvedPaper
1Claude 4.5 Opus (high reasoning)76.80link
2Gemini 3 Flash (high reasoning)75.80link
3MiniMax M2.5 (high reasoning)75.80link
4Claude Opus 4.675.60link
5Claude 4.5 Opus medium (20251101)74.40link
6Gemini 3 Pro Preview (2025-11-18)74.20link
7GLM-5 (high reasoning)72.80link
8GPT 5.2 Codex72.80link
9GPT-5-2 Codex72.80link
10GPT-5-2 (high reasoning)72.80link
11GPT-5.2 (2025-12-11) (high reasoning)71.80link
12Claude 4.5 Sonnet (high reasoning)71.40link
13Kimi K2.5 (high reasoning)70.80link
14Claude 4.5 Sonnet (20250929)70.60link
15DeepSeek V3.2 (high reasoning)70.00link
16Gemini 3 Pro69.60link
17GPT-5.2 (2025-12-11)69.00link
18Claude 4 Opus (20250514)67.60link
19Claude 4.5 Haiku (high reasoning)66.60link
20GPT-5.1 (2025-11-13) (medium reasoning)66.00link
21GPT-5.1-codex (medium reasoning)66.00link
22GPT-5 (2025-08-07) (medium reasoning)65.00link
23Claude 4 Sonnet (20250514)64.93link
24Kimi K2 Thinking63.40link
25Minimax M261.00link
26DeepSeek V3.2 Reasoner60.00β€”
27GPT-5 mini (2025-08-07) (medium reasoning)59.80link
28o3 (2025-04-16)58.40link
29Devstral small (2512)56.40β€”
30GPT-5 Mini56.20link
SWE-bench bash-only swe-bench-bash-only Leaderboard