A Survey Of Reinforcement Learning For Large Language Models Under Data Scarcity: Challenges And Solutions
2026 Β· Zhiyin Yu, Yuchen Mou, Juncheng Yan, et al.
Abstract
Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space
Authors
(none)
Tags
Stats
Related papers
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- OBLR-PO: A Theoretical Framework For Stable Reinforcement Learning (2025)0.00
- Reinforcement Learning Fine-tunes A Sparse Subnetwork In Large Language Models (2025)0.00
- Scaling Behaviors Of LLM Reinforcement Learning Post-training: An Empirical Study In Mathematical Reasoning (2025)0.00
- Zero-shot Model-based Reinforcement Learning Using Large Language Models (2024)0.00
- Guiding Reinforcement Learning Using Uncertainty-aware Large Language Models (2024)0.00
- Reinforcement Learning In The Era Of Llms: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, And Beyond (2023)0.00
- Mental Modeling Of Reinforcement Learning Agents By Language Models (2024)0.00