AIME
Emerging27papers using it
74HF downloads
0HF likes
2025first seen
The AIME dataset/benchmark contains a collection of mathematical reasoning tasks used to evaluate the performance of large language models in generating correct and intermediate reasoning steps.
Papers using AIME (27)
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language ModelsKimi k1.5: Scaling Reinforcement Learning with LLMsRing-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMsLogic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
LearningThickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM ReasoningEBPO: Empirical Bayes Shrinkage for Stabilizing Group-Relative Policy OptimizationLong-horizon Reasoning Agent for Olympiad-Level Mathematical Problem SolvingScRPO: From Errors to Insightsh1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement LearningPinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement LearningMARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMsCLAWS:Creativity detection for LLM-generated solutions using Attention Window of SectionsGIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNATemplateRL: Structured Template-Guided Reinforcement Learning for LLM ReasoningMaximizing Confidence Alone Improves ReasoningDeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal DecompositionRL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement LearningOn the Design of KL-Regularized Policy Gradient Algorithms for LLM ReasoningEnigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable PuzzlesKnow When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement LearningWirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement LearningETR: Outcome-Guided Elastic Trust Regions for Policy OptimizationSD-E$^2$: Semantic Exploration for Reasoning Under Token BudgetsEntropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language ModelsEvolutionary System Prompt Learning for Reinforcement Learning in LLMsLLM Reasoning with Process Rewards for Outcome-Guided Steps