← all datasets

AIME

Emerging

1papers using it

2026first seen

The AIME dataset/benchmark is used to evaluate the reasoning capabilities of large language models in the context of test-time scaling.

🔎 Find this dataset

Papers using AIME (1)

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling2026