AIME-24

Emerging

8papers using it

2024first seen

AIME-24 is a benchmark dataset used to evaluate the performance of Tool-Integrated Reasoning systems by assessing their ability to engage in interleaved deliberation and strategic planning during tool invocation.

🔎 Find this dataset

Papers using AIME-24 (4)

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning2026

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems2026

What If We Allocate Test-Time Compute Adaptively?2026

Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling2026