AIME 2025-2026
Emerging1papers using it
2026first seen
The 'AIME 2025--2026' dataset/benchmark contains reasoning tasks and is used to evaluate the performance of models in reasoning-intensive problems, particularly in the context of retrieval-augmented generation.