Train Once, Get A Family: State-adaptive Balances For Offline-to-online Reinforcement Learning
2023 Β· Shenzhi Wang, Qisen Yang, Jiawei Gao, et al.
Abstract
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a
Authors
(none)
Tags
Stats
Related papers
- Adaptive Policy Selection And Fine-tuning Under Interaction Budgets For Offline-to-online Reinforcement Learning (2026)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- The Three Regimes Of Offline-to-online Reinforcement Learning (2025)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00