BigCode Models Leaderboard (Win Rate) bigcode-models-winrate Leaderboard
BigCode Models Leaderboard β open-access code-LLM comparison across HumanEval (Python) and MultiPL-E translations (Java, JavaScript, C++, and more). Win Rate is the average win rate of each model across all evaluated languages, giving a single aggregate ranking of open code models. Β· Metric: Win Rate (higher is better)
| # | Model | Win Rate | Paper |
|---|---|---|---|
| 1 | Qwen2.5-Coder-32B-Instruct | 59.17 | link |
| 2 | Qwen2.5-Coder-32B | 56.67 | link |
| 3 | OpenCodeInterpreter-DS-33B | 56.25 | link |
| 4 | Nxcode-CQ-7B-orpo | 55.92 | link |
| 5 | CodeQwen1.5-7B-Chat | 55.67 | link |
| 6 | CodeFuse-DeepSeek-33b | 54.67 | link |
| 7 | DeepSeek-Coder-33b-instruct | 52.25 | link |
| 8 | Artigenz-Coder-DS-6.7B | 51.67 | link |
| 9 | DeepSeek-Coder-7b-instruct | 50.58 | link |
| 10 | OpenCodeInterpreter-DS-6.7B | 49.75 | link |
| 11 | Phind-CodeLlama-34B-v2 | 49.42 | link |
| 12 | Phind-CodeLlama-34B-v1 | 48.15 | link |
| 13 | Phind-CodeLlama-34B-Python-v1 | 46.65 | link |
| 14 | CodeQwen1.5-7B | 45.17 | link |
| 15 | CodeLlama-70b-Instruct | 43.58 | link |
| 16 | WizardCoder-Python-34B-V1.0 | 43.58 | link |
| 17 | CodeLlama-70b | 43.21 | link |
| 18 | DeepSeek-Coder-33b-base | 42.75 | link |
| 19 | CodeLlama-70b-Python | 42.33 | link |
| 20 | StarCoder2-15B | 39.92 | link |
| 21 | CodeLlama-34b-Instruct | 38.65 | link |
| 22 | WizardCoder-Python-13B-V1.0 | 38.35 | link |
| 23 | DeepSeek-Coder-7b-base | 38.33 | link |
| 24 | CodeLlama-34b | 37.81 | link |
| 25 | CodeLlama-34b-Python | 36.96 | link |
| 26 | WizardCoder-15B-V1.0 | 35.31 | link |
| 27 | CodeLlama-13b-Instruct | 34.35 | link |
| 28 | CodeGemma-7B | 33.83 | link |
| 29 | CodeLlama-13b | 32.27 | link |
| 30 | CodeLlama-13b-Python | 30.19 | link |