SWE-bench Verified swe-bench-verified Leaderboard
SWE-bench Verified β 500 real GitHub issues from 12 popular Python repos, human-validated. Frontier benchmark for AI software engineering agents. % Resolved = how often the agent's patch actually fixes the bug under the hidden test suite. Β· Metric: % Resolved (higher is better)
| # | Model | % Resolved | Paper |
|---|---|---|---|
| 1 | live-SWE-agent + Claude 4.5 Opus medium (20251101) | 79.20 | link |
| 2 | Sonar Foundation Agent + Claude 4.5 Opus | 79.20 | link |
| 3 | TRAE + Doubao-Seed-Code | 78.80 | link |
| 4 | live-SWE-agent + Gemini 3 Pro Preview (2025-11-18) | 77.40 | link |
| 5 | Atlassian Rovo Dev (2025-09-02) | 76.80 | link |
| 6 | EPAM AI/Run Developer Agent v20250719 + Claude 4 Sonnet | 76.80 | link |
| 7 | mini-SWE-agent + Claude 4.5 Opus (high reasoning) | 76.80 | link |
| 8 | ACoder | 76.40 | link |
| 9 | mini-SWE-agent + Gemini 3 Flash (high reasoning) | 75.80 | link |
| 10 | mini-SWE-agent + MiniMax M2.5 (high reasoning) | 75.80 | link |
| 11 | mini-SWE-agent + Claude Opus 4.6 | 75.60 | link |
| 12 | Warp | 75.60 | link |
| 13 | TRAE + Claude Sonnet 4 + Opus 4 + Sonnet 3.7 + Gemini 2.5 Pro | 75.20 | link |
| 14 | Harness AI | 74.80 | link |
| 15 | Sonar Foundation Agent + Claude 4.5 Sonnet | 74.80 | link |
| 16 | JoyCode + Claude 4 Sonnet + GPT-4.1 | 74.60 | link |
| 17 | Lingxi-v1.5_claude-4-sonnet-20250514 | 74.60 | link |
| 18 | mini-SWE-agent + Claude 4.5 Opus medium (20251101) | 74.40 | link |
| 19 | Prometheus-v1.2.1 + GPT-5 | 74.40 | link |
| 20 | Refact.ai Agent + Claude 4 Sonnet + o4-mini | 74.40 | link |
| 21 | mini-SWE-agent + Gemini 3 Pro Preview (2025-11-18) | 74.20 | link |
| 22 | Salesforce AI Research SAGE (OpenHands) | 73.80 | link |
| 23 | Tools + Claude 4 Opus (2025-05-22) | 73.20 | link |
| 24 | Salesforce AI Research SAGE (bash-only) | 73.00 | link |
| 25 | mini-SWE-agent + GLM-5 (high reasoning) | 72.80 | link |
| 26 | mini-SWE-agent + GPT-5-2 Codex | 72.80 | link |
| 27 | mini-SWE-agent + GPT-5-2 (high reasoning) | 72.80 | link |
| 28 | Tools + Claude 4 Sonnet (2025-05-22) | 72.40 | link |
| 29 | mini-SWE-agent + GPT-5.2 (2025-12-11) (high reasoning) | 71.80 | link |
| 30 | OpenHands + GPT-5 | 71.80 | link |