AIME-25
Emerging1papers using it
2026first seen
The 'AIME-25' dataset/benchmark is used to evaluate mathematical reasoning capabilities of models, specifically assessing their ability to internalize reasoning processes through critique-based training methods.