Scaling Behaviors Of LLM Reinforcement Learning Post-training: An Empirical Study In Mathematical Reasoning
2025 Β· Zelin Tan, Hejia Geng, Xiaohang Yu, et al.
Abstract
While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper presents a systematic empirical investigation of scaling behaviors in RL-based post-training, with a particular focus on mathematical reasoning. Based on a set of experiments across the full Qwen2.5 dense model series (0.5B to 72B), we characterize how model scale, data volume, and computational budget interact to shape performance. Our analysis leads to four key findings: 1. Larger models consistently exhibit superior learning efficiency on both compute and data metrics. 2. The relationship between test loss, compute, and data can be modeled by a predictive power-law which is robust across both base and instruction-tuned models. 3. Although larger models exhibit higher learning efficiency, the analytical learning efficiency term k(N) in the power-law reveals a latent saturation trend
Authors
(none)
Tags
Stats
Related papers
- The Art Of Scaling Reinforcement Learning Compute For Llms (2025)1.57
- A Survey Of Reinforcement Learning For Large Language Models Under Data Scarcity: Challenges And Solutions (2026)0.00
- A Relative-budget Theory For Reinforcement Learning With Verifiable Rewards In Large Language Model Reasoning (2026)0.00
- Reinforcement Learning Fine-tunes A Sparse Subnetwork In Large Language Models (2025)0.00
- Mental Modeling Of Reinforcement Learning Agents By Language Models (2024)0.00
- Stabilizing Policy Gradients For Sample-efficient Reinforcement Learning In LLM Reasoning (2025)0.00
- Think In Games: Learning To Reason In Games Via Reinforcement Learning With Large Language Models (2025)0.00
- Scaling Laws For A Multi-agent Reinforcement Learning Model (2022)0.00