BigCodeBench (Instruct) bigcodebench-instruct Leaderboard

#	Model	Pass@1	Paper
1	GPT-4o-2024-05-13	51.10	—
2	DeepSeek-V3	50.00	—
3	Llama-4-Maverick	49.70	—
4	Quasar-Alpha	49.60	—
5	Gemini-Exp-1114	49.20	—
6	Qwen2.5-Coder-32B-Instruct	49.00	—
7	DeepSeek-V2-Chat (2024-06-28)	48.90	—
8	GPT-4.1-Mini-2025-04-14	48.90	—
9	DeepSeek-V2.5-1210	48.60	—
10	DeepSeek-Coder-V2-Instruct	48.20	—
11	GPT-4-Turbo-2024-04-09	48.20	—
12	Qwen2.5-Coder-14B-Instruct	48.20	—
13	GPT-4o-2024-11-20	48.00	—
14	Athene-V2-Chat	47.20	—
15	Gemini-Exp-1206	47.00	—
16	Llama-3.3-70B-Instruct	46.90	—
17	Claude-3.5-Sonnet-20240620	46.80	—
18	Athene-V2-Agent	46.20	—
19	Claude-3.5-Haiku-20241022	46.10	—
20	GPT-4o-mini-2024-07-18	46.10	—
21	Llama-3.1-70B-Instruct	46.10	—
22	GPT-4-0613	46.00	—
23	Gemini-2.0-Flash-Exp	45.90	—
24	Qwen2.5-72B-Instruct	45.80	—
25	Hermes-2-Theta-Llama-3-70B	45.60	—
26	Claude-3-Opus-20240229	45.50	—
27	Phi-4	45.50	—
28	Gemini-Exp-1121	45.40	—
29	Mistral-Small-24B-Instruct-2501	45.30	—
30	Sky-T1-32B-Flash	45.10	—

BigCodeBench (Instruct) bigcodebench-instruct Leaderboard