← all datasets

AIME

Emerging
1papers using it
2026first seen

The 'AIME' dataset/benchmark is used to evaluate the performance of reinforcement learning methods in the context of long-horizon logical reasoning tasks for large language models.

Papers using AIME (1)

AIME β€” datasets β€” time-series