#ModelAccuracyPaper
1GPT-5.5 (xHigh)85.00link
2Gemini 3 Deep Think (2/26)84.58link
3GPT-5.5 Pro (High)84.58link
4GPT-5.4 Pro (xHigh)83.33link
5Gemini 3.1 Pro (Preview)77.08link
6Claude 4.7 (Max)75.83link
7GPT-5.4 (xHigh)73.95link
8GPT-5.2 (Refine.)72.90link
9Claude Opus 4.8 (High)72.08link
10Gemini 3.5 Flash (High)72.08link
11Claude Opus 4.6 (120K, High)69.17link
12Grok 4.20 (Reasoning)65.14link
13Claude Sonnet 4.6 (High)60.42link
14GPT-5.2 Pro (High)54.16link
15Gemini 3 Pro (Refine.)54.00link
16GPT-5.2 (xHigh)52.91link
17Gemini 3 Deep Think (Preview) Β²45.14link
18Opus 4.5 (Thinking, 64K)37.64link
19Gemini 3 Flash Preview (High)33.61link
20Gemini 3 Pro31.11link
21Grok 4 (Refine.)29.44link
22GLM-5.222.78link
23GPT-5.4 Mini (xHigh)18.90link
24GPT-5 Pro18.33link
25GPT-5.1 (Thinking, High)17.64link
26Grok 4 (Thinking)15.97link
27Claude Sonnet 4.5 (Thinking 32K)13.61link
28Kimi K2.511.81link
29GPT-5 (High)9.86link
30Claude Opus 4 (Thinking 16K)8.61link