← all datasets

AIME

Emerging
27papers using it
74HF downloads
0HF likes
2025first seen

The AIME dataset/benchmark contains a collection of mathematical reasoning tasks used to evaluate the performance of large language models in generating correct and intermediate reasoning steps.

Papers using AIME (27)

AIME β€” datasets β€” reinforcement-learning