AIME-25

Emerging

1papers using it

2026first seen

The 'AIME-25' dataset/benchmark is used to evaluate mathematical reasoning capabilities of models, specifically assessing their ability to internalize reasoning processes through critique-based training methods.

🔎 Find this dataset

Papers using AIME-25 (1)

Critique-Guided Distillation for Robust Reasoning via Refinement2026