← all datasets

AIME

Emerging
1papers using it
2026first seen

The AIME dataset/benchmark is used to evaluate the reasoning capabilities of large language models in the context of test-time scaling.

Papers using AIME (1)