AIME-24

Emerging

1papers using it

2026first seen

The 'AIME-24' dataset/benchmark is used to evaluate the mathematical reasoning capabilities of models, specifically in the context of critique-guided training approaches.

🔎 Find this dataset

Papers using AIME-24 (1)

Critique-Guided Distillation for Robust Reasoning via Refinement2026

AIME-24 — datasets — recommender-systems