Adaptive Behavior Cloning Regularization For Stable Offline-to-online Reinforcement Learning
2022 Β· Yi Zhao, Rinu Boney, Alexander Ilin, et al.
Abstract
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and would further need to be fine-tuned online by interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. While constraints enforced by offline RL methods such as a behaviour cloning loss prevent this to an extent, these constraints also significantly slow down online fine-tuning by forcing the agent to stay close to the behavior policy. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Moreover, we use a randomized ensemble of Q functions to further increase the sample efficiency of online fine-tuning by perf
Authors
(none)
Tags
Stats
Related papers
- Know Your Boundaries: The Necessity Of Explicit Behavioral Cloning In Offline RL (2022)0.00
- Reliable Conditioning Of Behavioral Cloning For Offline Reinforcement Learning (2022)0.00
- Improving TD3-BC: Relaxed Policy Constraint For Offline Learning And Stable Online Fine-tuning (2022)0.00
- Balancing Policy Constraint And Ensemble Size In Uncertainty-based Offline Reinforcement Learning (2023)5.24
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? (2022)0.00
- Iteratively Refined Behavior Regularization For Offline Reinforcement Learning (2023)2.26
- Efficient Offline Reinforcement Learning: First Imitate, Then Improve (2024)1.91