← all datasets

AIME 2024

Emerging

23papers using it

36,217HF downloads

83HF likes

2025first seen

AIME 2024 Dataset Dataset Description This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024. AIME is a prestigious high school mathematics competition known for its challenging mathematical problems. Dataset Details Format: JSONL Size: 30 records Source: AIME 2024 I & II Lang

🤗 Hugging Face⚖ mit

Papers using AIME 2024 (23)

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space2025 · 26 cites

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback2025

Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning2026

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention2025

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping2025

Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation2025

Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty2025

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping2025

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!2025

SIFT: Grounding LLM Reasoning in Contexts via Stickers2025

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation2025

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking2025

Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think2025

Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier2025

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space2025

Prior Prompt Engineering for Reinforcement Fine-Tuning2025

Not All Correct Answers Are Equal: Why Your Distillation Source Matters2025

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models2025

Self-Reflective Generation at Test Time2025

DeepPrune: Parallel Scaling without Inter-trace Redundancy2025

Information-Preserving Reformulation of Reasoning Traces for Antidistillation2025

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention2025

Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning2025

AIME 2024 — datasets — llm-papers