AIME-24/25
Emerging7papers using it
2025first seen
The AIME24/25 dataset/benchmark is used to evaluate the reasoning capabilities of diffusion large language models (dLLMs) in generating high-quality outputs while balancing exploration and quality during token decoding.
Papers using AIME-24/25 (7)
- Inference Time Optimization with Confidence DynamicsLocally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language ModelsAligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMsInfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning CapabilitiesInput-Time Scaling: Adding Noise and Irrelevance into Less-Is-More Drastically Improves Reasoning Performance and EfficiencyInfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs
to Enhance Reasoning CapabilitiesSocratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution