AIME-24
Emerging29papers using it
6,620HF downloads
18HF likes
2025first seen
AIME 24 American Invitational Mathematics Examination (AIME) 2024 Citation If you use the AIME24 dataset in your research, please consider citing it as follows: @misc{aime24, title={American Invitational Mathematics Examination (AIME) 2024}, author={Zhang, Yifan and Math-AI, Team}, year={2024}, }
π€ Hugging Faceβ apache-2.0
Papers using AIME-24 (29)
- Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language ModelsIntrospective Diffusion Language ModelsLycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse DecodingOff-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical ReasoningKVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning TasksBenchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMsTest-time Recursive Thinking: Self-Improvement without External FeedbackLong Chain-of-Thought Compression via Fine-Grained Group Policy OptimizationMoL-RL: Distilling Multi-Step Environmental Feedback into LLMs for Feedback-Independent ReasoningCan 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
ScalingLight-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
BeyondReinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn'tProcess Reward Models That ThinkSEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy
OptimizationFirst Finish Search: Efficient Test-Time Scaling in Large Language
ModelsWhich Data Attributes Stimulate Math and Code Reasoning? An
Investigation via Influence FunctionsSkywork Open Reasoner 1 Technical ReportBeyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM ReasoningInference-Time Hyper-Scaling with KV Cache CompressionProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMsBeyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVRDCPO: Dynamic Clipping Policy OptimizationSimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated ReasoningPromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model
ReasoningScaleDiff: Scaling Difficult Problems for Advanced Mathematical
ReasoningSkill-Targeted Adaptive TrainingCan LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM ReasoningScaling Reasoning without AttentionSRPO: A Cross-Domain Implementation of Large-Scale Reinforcement
Learning on LLM