Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data
2024 Β· Zhiyuan Zhou, Andy Peng, Qiyang Li, et al.
Abstract
The modern paradigm in machine learning involves pre-training on diverse data, followed by task-specific fine-tuning. In reinforcement learning (RL), this translates to learning via offline RL on a diverse historical dataset, followed by rapid online RL fine-tuning using interaction data. Most RL fine-tuning methods require continued training on offline data for stability and performance. However, this is undesirable because training on diverse offline data is slow and expensive for large datasets, and in principle, also limit the performance improvement possible because of constraints or pessimism on offline data. In this paper, we show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL initializations. To build this approach, we start by analyzing the role of retaining offline data in online fine-tuning. We find that continued training on offline data is mostly useful for preventing a sudden divergence in the
Authors
(none)
Tags
Stats
Related papers
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- The Three Regimes Of Offline-to-online Reinforcement Learning (2025)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Reward-agnostic Fine-tuning: Provable Statistical Benefits Of Hybrid Reinforcement Learning (2023)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77