#ModelEloPaper
1DeepSeek-V3-Chat1216.89β€”
2GPT-4o-2024-05-131216.72β€”
3DeepSeek-V2-Chat (2024-06-28)1186.31β€”
4DeepSeek-Coder-V2-Instruct1184.20β€”
5Gemini-Exp-11141173.74β€”
6Gemini-Exp-12061172.42β€”
7Qwen2.5-Coder-32B-Instruct1168.91β€”
8GPT-4-Turbo-2024-04-091162.95β€”
9GPT-4o-2024-11-201156.35β€”
10Claude-3.5-Sonnet-202406201146.48β€”
11GPT-4-06131143.07β€”
12Codestral-25011142.93β€”
13Claude-3.5-Haiku-202410221142.85β€”
14Gemini-2.0-Flash-Exp1142.47β€”
15Llama-3.3-70B-Instruct1142.14β€”
16GPT-4o-mini-2024-07-181141.20β€”
17Athene-V2-Chat1140.81β€”
18Claude-3-Opus-202402291132.72β€”
19Athene-V2-Agent1128.42β€”
20Hermes-2-Theta-Llama-3-70B1127.49β€”
21Qwen2.5-72B-Instruct1125.66β€”
22Gemini-Exp-11211123.33β€”
23Gemini-1.5-Pro-API-05141123.08β€”
24DeepSeek-V2.5-12101123.05β€”
25Llama-3.1-70B-Instruct1122.56β€”
26Phi-41119.78β€”
27Claude-3.5-Sonnet-202410221112.66β€”
28Gemini-1.5-Flash-API-05141105.38β€”
29Llama-3-70B-Instruct1099.57β€”
30Llama-3-70B-Synthia-v3.51096.57β€”
BigCodeBench (Elo) bigcodebench-elo Leaderboard