Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions
2023 Β· Yicheng Luo, Jackie Kay, Edward Grefenstette, et al.
Abstract
Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.
Authors
(none)
Tags
Stats
Related papers
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data (2024)0.00
- Policy Finetuning: Bridging Sample-efficient Offline And Online Reinforcement Learning (2021)0.00
- The Three Regimes Of Offline-to-online Reinforcement Learning (2025)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00