#ModelAccuracyPaper
1Gemini 3.1 Pro (Preview)98.00link
2GPT-5.5 Pro (High)96.50link
3Gemini 3 Deep Think (2/26)96.00link
4GPT-5.5 (xHigh)95.00link
5GPT-5.2 (Refine.)94.50link
6GPT-5.4 Pro (xHigh)94.50link
7Claude Opus 4.6 (120K, High)94.00link
8GPT-5.4 (xHigh)93.67link
9Claude 4.7 (High)93.50link
10Claude Opus 4.8 (Max)92.50link
11Gemini 3.5 Flash (High)92.50link
12GPT-5.2 Pro (xHigh)90.50link
13Grok 4.20 (Reasoning)89.50link
14Gemini 3 Deep Think (Preview) ²87.50link
15Claude Sonnet 4.6 (High)86.50link
16GPT-5.2 (xHigh)86.17link
17Gemini 3 Flash Preview (High)84.67link
18Opus 4.5 (Thinking, 64K)80.00link
19Grok 4 (Refine.)79.60link
20GLM-5.277.00link
21Gemini 3 Pro75.00link
22GPT-5.1 (Thinking, High)72.83link
23GPT-5 Pro70.17link
24Grok 4 (Thinking)66.67link
25GPT-5 (High)65.67link
26Kimi K2.565.33link
27Claude Sonnet 4.5 (Thinking 32K)63.67link
28GPT-5.4 Mini (xHigh)63.67link
29Minimax M2.563.67link
30o3 (High)60.83link