PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning
2023 Β· Jianxiong Li, Xiao Hu, Haoran Xu, et al.
Abstract
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to diverse methods. Simple yet elegant, PROTO imposes minimal additional computation and enables highly
Authors
(none)
Tags
Stats
Related papers
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- A Non-monolithic Policy Approach Of Offline-to-online Reinforcement Learning (2024)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Uni-o4: Unifying Online And Offline Deep Reinforcement Learning With Multi-step On-policy Optimization (2023)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- Adaptive Policy Selection And Fine-tuning Under Interaction Budgets For Offline-to-online Reinforcement Learning (2026)0.00
- The Three Regimes Of Offline-to-online Reinforcement Learning (2025)0.00