A Relative-budget Theory For Reinforcement Learning With Verifiable Rewards In Large Language Model Reasoning
2026 Β· Akifumi Wachi, Hirota Kinoshita, Shokichi Takakura, et al.
Abstract
Reinforcement learning (RL) is a dominant paradigm for improving the reasoning abilities of large language models, yet its effectiveness varies across tasks and compute budgets. We propose a *relative-budget* theory explaining this variation through a single quantity called relative budget \(\xi := H/\mathbb\{E\}[T]\), where \(H\) is the generation horizon (token budget) and \(T\) denotes the number of tokens until the first correct solution under a base policy. We show that \(\xi\) determines sample efficiency by controlling reward variance and the likelihood of informative trajectories. Our analysis reveals three regimes: in the *deficient* regime (\(\xi \to 0\)), informative trajectories are rare and the sample complexity explodes; in the *balanced* regime (\(\xi=\Theta(1)\)), informative trajectories occur with non-negligible probability and RL is maximally sample-efficient; and in the *ample* regime (\(\xi \to \infty\)), learning remains stable but marginal gains per iteration dim
Authors
(none)
Tags
Stats
Related papers
- OBLR-PO: A Theoretical Framework For Stable Reinforcement Learning (2025)0.00
- Scaling Behaviors Of LLM Reinforcement Learning Post-training: An Empirical Study In Mathematical Reasoning (2025)0.00
- Rule-bottleneck Reinforcement Learning: Joint Explanation And Decision Optimization For Resource Allocation With Language Agents (2025)0.00
- DISPO: Enhancing Training Efficiency And Stability In Reinforcement Learning For Large Language Model Mathematical Reasoning (2026)0.00
- The Implicit Curriculum: Learning Dynamics In RL With Verifiable Rewards (2026)0.00
- On The Optimization Dynamics Of RLVR: Gradient Gap And Step Size Thresholds (2025)0.00
- Response-level Rewards Are All You Need For Online Reinforcement Learning In Llms: A Mathematical Perspective (2025)0.00
- A Survey Of Reinforcement Learning For Large Language Models Under Data Scarcity: Challenges And Solutions (2026)0.00