AIME-24
Emerging1papers using it
2026first seen
The 'AIME-24' dataset/benchmark is used to evaluate the mathematical reasoning capabilities of models, specifically in the context of critique-guided training approaches.
The 'AIME-24' dataset/benchmark is used to evaluate the mathematical reasoning capabilities of models, specifically in the context of critique-guided training approaches.