← all datasets

AIME-24

Emerging
1papers using it
2026first seen

The 'AIME-24' dataset/benchmark is used to evaluate the mathematical reasoning capabilities of models, specifically in the context of critique-guided training approaches.

Papers using AIME-24 (1)

AIME-24 β€” datasets β€” recommender-systems