← all datasets

AIME-24

Emerging
5papers using it
2024first seen

The AIME24 dataset/benchmark is used to evaluate the performance of Tool-Integrated Reasoning systems by providing a set of tasks that require strategic planning and self-correction through sequential tool invocation.

Papers using AIME-24 (5)

AIME-24 β€” datasets β€” ai-agents