AIME
Emerging1papers using it
2026first seen
The AIME dataset/benchmark is used to evaluate the reasoning capabilities of large language models in the context of test-time scaling.
The AIME dataset/benchmark is used to evaluate the reasoning capabilities of large language models in the context of test-time scaling.