BigCodeBench (Complete) bigcodebench-complete Leaderboard
BigCodeBench Complete subset β code completion across 1,140 diverse function-level tasks from 139 libraries. Measures whether models can finish realistic code given partial context. Β· Metric: Pass@1 (higher is better)
| # | Model | Pass@1 | Paper |
|---|---|---|---|
| 1 | Gemini-Exp-1206 | 62.40 | β |
| 2 | DeepSeek-V3 | 62.20 | β |
| 3 | Llama-4-Maverick | 61.40 | β |
| 4 | GPT-4o-2024-05-13 | 61.10 | β |
| 5 | Quasar-Alpha | 60.60 | β |
| 6 | Gemini-2.0-Flash-Exp | 59.90 | β |
| 7 | DeepSeek-Coder-V2-Instruct | 59.70 | β |
| 8 | DeepSeek-V2-Chat (2024-06-28) | 59.40 | β |
| 9 | Gemini-Exp-1114 | 59.30 | β |
| 10 | GPT-4.1-Mini-2025-04-14 | 59.30 | β |
| 11 | Claude-3.5-Haiku-20241022 | 59.00 | β |
| 12 | GPT-4o-2024-11-20 | 58.90 | β |
| 13 | Claude-3.5-Sonnet-20240620 | 58.60 | β |
| 14 | GPT-4-Turbo-2024-04-09 | 58.20 | β |
| 15 | Gemini-Exp-1121 | 58.10 | β |
| 16 | Qwen2.5-Coder-32B-Instruct | 58.00 | β |
| 17 | Claude-3.5-Sonnet-20241022 | 57.50 | β |
| 18 | Gemini-1.5-Pro-API-0514 | 57.50 | β |
| 19 | Llama-3.3-70B-Instruct | 57.50 | β |
| 20 | Claude-3-Opus-20240229 | 57.40 | β |
| 21 | GPT-4o-mini-2024-07-18 | 57.40 | β |
| 22 | GPT-4-0613 | 57.20 | β |
| 23 | Athene-V2-Chat | 56.80 | β |
| 24 | Qwen2.5-Coder-14B-Instruct | 56.70 | β |
| 25 | Athene-V2-Agent | 56.10 | β |
| 26 | Qwen2.5-72B-Instruct | 55.90 | β |
| 27 | Hermes-2-Theta-Llama-3-70B | 55.60 | β |
| 28 | Phi-4 | 55.40 | β |
| 29 | Gemini-1.5-Flash-API-0514 | 55.10 | β |
| 30 | DeepSeek-R1-Distill-Qwen-32B | 54.90 | β |