Chatbot Arena (LMSYS) chatbot-arena Leaderboard
Crowdsourced head-to-head LLM comparison - Elo rating from human preference votes Β· Metric: Elo (higher is better)
| # | Model | Elo | Paper |
|---|---|---|---|
| 1 | gpt-5.5 | 1543.99 | β |
| 2 | qwen3.5-max-preview | 1536.99 | β |
| 3 | claude-opus-4-6 | 1534.61 | β |
| 4 | claude-opus-4-6-thinking | 1533.45 | β |
| 5 | kimi-k2.6 | 1530.78 | β |
| 6 | claude-fable-5 | 1530.52 | β |
| 7 | gpt-5.5-high | 1529.98 | β |
| 8 | gemini-3-pro | 1528.90 | β |
| 9 | gemini-3.1-pro-preview | 1526.94 | β |
| 10 | qwen3.7-max-preview | 1525.85 | β |
| 11 | claude-opus-4-7-thinking | 1520.19 | β |
| 12 | glm-5 | 1518.41 | β |
| 13 | qwen3.7-plus | 1517.90 | β |
| 14 | gpt-5.4-high | 1517.32 | β |
| 15 | claude-opus-4-7 | 1517.28 | β |
| 16 | gemini-3.5-flash | 1516.06 | β |
| 17 | gemini-3-flash | 1513.75 | β |
| 18 | gpt-5.4 | 1513.46 | β |
| 19 | claude-opus-4-8 | 1512.24 | β |
| 20 | glm-5.2 (max) | 1511.93 | β |
| 21 | glm-5.1 | 1511.42 | β |
| 22 | gemini-2.5-pro | 1506.12 | β |
| 23 | ernie-5.0-preview-1022 | 1505.57 | β |
| 24 | muse-spark | 1504.98 | β |
| 25 | qwen3.5-397b-a17b | 1504.83 | β |
| 26 | mimo-v2.5-pro | 1503.97 | β |
| 27 | Claude Opus 4.7 | 1395.00 | β |
| 28 | GPT-5 | 1382.00 | β |
| 29 | Claude Sonnet 4.6 | 1378.00 | β |
| 30 | Gemini 2.5 Pro | 1370.00 | β |
| 31 | GPT-4o (2024-11-20) | 1340.00 | β |
| 32 | DeepSeek-V3 | 1330.00 | β |
| 33 | Claude Haiku 4.5 | 1310.00 | β |
| 34 | Qwen 2.5 Max | 1290.00 | β |
| 35 | Claude 3.5 Sonnet | 1280.00 | β |
| 36 | Llama 4 405B Instruct | 1280.00 | β |
| 37 | Gemini 1.5 Pro (2024-02) | 1275.00 | β |
| 38 | Llama 3 70B Instruct | 1208.00 | link |
| 39 | Mixtral 8x22B Instruct | 1148.00 | β |
| 40 | Chatbot Arena (introducing) | 0.00 | β |