A Non-monolithic Policy Approach Of Offline-to-online Reinforcement Learning
2024 Β· Jaeyoon Kim, Junyu Xuan, Christy Liang, et al.
Abstract
Offline-to-online reinforcement learning (RL) leverages both pre-trained offline policies and online policies trained for downstream tasks, aiming to improve data efficiency and accelerate performance enhancement. An existing approach, Policy Expansion (PEX), utilizes a policy set composed of both policies without modifying the offline policy for exploration and learning. However, this approach fails to ensure sufficient learning of the online policy due to an excessive focus on exploration with both policies. Since the pre-trained offline policy can assist the online policy in exploiting a downstream task based on its prior experience, it should be executed effectively and tailored to the specific requirements of the downstream task. In contrast, the online policy, with its immature behavioral strategy, has the potential for exploration during the training phase. Therefore, our research focuses on harmonizing the advantages of the offline policy, termed exploitation, with those of the
Authors
(none)
Tags
Stats
Related papers
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Policy Agnostic RL: Offline RL And Online RL Fine-tuning Of Any Class And Backbone (2024)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00
- Uni-o4: Unifying Online And Offline Deep Reinforcement Learning With Multi-step On-policy Optimization (2023)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00