The Three Regimes Of Offline-to-online Reinforcement Learning
2025 Β· Lu Li, Tianwei Ni, Yihao Sun, et al.
Abstract
Offline-to-online reinforcement learning (RL) has emerged as a practical paradigm that leverages offline datasets for pretraining and online interactions for fine-tuning. However, its empirical behavior is highly inconsistent: design choices of online fine-tuning that work well in one setting can fail completely in another. We propose a stability--plasticity principle that can explain this inconsistency: we should preserve the knowledge of pretrained policy or offline dataset during online fine-tuning, whichever is better, while maintaining sufficient plasticity. This perspective identifies three regimes of online fine-tuning, each requiring distinct stability properties. We validate this framework through a large-scale empirical study, finding that the results strongly align with its predictions in 45 of 63 cases, with only 3 opposite mismatches. This work provides a principled framework for guiding design choices in offline-to-online RL based on the relative performance of the offlin
Authors
(none)
Tags
Stats
Related papers
- Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data (2024)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- Optimality Inductive Biases And Agnostic Guidelines For Offline Reinforcement Learning (2021)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56