BigCode Models Leaderboard (HumanEval Python) bigcode-models-humaneval Leaderboard
Single-shot HumanEval pass@1 (Python) for open-access code models from the BigCode Models Leaderboard β the canonical functional-correctness benchmark of 164 hand-written programming problems, evaluated under one consistent harness. Pass@1 is the fraction (0β100%) of problems whose first generated sample passes all hidden unit tests; it is NOT a count of problems solved. Excludes agentic / iterative-debugging systems, multi-attempt (pass@k>1) and execution-feedback pipelines, and non-Python variants (HumanEval-C++/HumanEval-X) β those are not comparable to single-shot single-model Python pass@1. Β· Metric: Pass@1 (higher is better)
| # | Model | Pass@1 | Paper |
|---|---|---|---|
| 1 | Nxcode-CQ-7B-orpo | 87.23 | link |
| 2 | CodeQwen1.5-7B-Chat | 87.20 | link |
| 3 | Qwen2.5-Coder-32B-Instruct | 83.20 | link |
| 4 | DeepSeek-Coder-7b-instruct | 80.22 | link |
| 5 | DeepSeek-Coder-33b-instruct | 80.02 | link |
| 6 | CodeFuse-DeepSeek-33b | 76.83 | link |
| 7 | CodeLlama-70b-Instruct | 75.60 | link |
| 8 | OpenCodeInterpreter-DS-33B | 75.23 | link |
| 9 | OpenCodeInterpreter-DS-6.7B | 73.20 | link |
| 10 | Phind-CodeLlama-34B-v2 | 71.95 | link |
| 11 | Artigenz-Coder-DS-6.7B | 70.89 | link |
| 12 | WizardCoder-Python-34B-V1.0 | 70.73 | link |
| 13 | Phind-CodeLlama-34B-Python-v1 | 70.22 | link |
| 14 | Phind-CodeLlama-34B-v1 | 65.85 | link |
| 15 | WizardCoder-Python-13B-V1.0 | 62.19 | link |
| 16 | WizardCoder-15B-V1.0 | 58.12 | link |
| 17 | Qwen2.5-Coder-32B | 57.10 | link |
| 18 | CodeLlama-70b-Python | 55.49 | link |
| 19 | CodeLlama-34b-Python | 53.29 | link |
| 20 | CodeGemma-7B-it | 52.74 | link |
| 21 | DeepSeek-Coder-33b-base | 52.45 | link |
| 22 | CodeLlama-70b | 52.44 | link |
| 23 | Phi-1 | 51.22 | link |
| 24 | CodeLlama-34b-Instruct | 50.79 | link |
| 25 | CodeQwen1.5-7B | 50.79 | link |
| 26 | CodeLlama-13b-Instruct | 50.60 | link |
| 27 | DeepSeek-Coder-7b-base | 45.83 | link |
| 28 | CodeLlama-7b-Instruct | 45.65 | link |
| 29 | OctoCoder-15B | 45.30 | link |
| 30 | CodeLlama-34b | 45.11 | link |