← all datasets

AIME-25

Emerging
1papers using it
2026first seen

The 'AIME-25' dataset/benchmark is used to evaluate mathematical reasoning capabilities of models, specifically assessing their ability to internalize reasoning processes through critique-based training methods.

Papers using AIME-25 (1)

AIME-25 β€” datasets β€” recommender-systems