Cal-ql: Calibrated Offline RL Pre-training For Efficient Online Fine-tuning
2023 Β· Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, et al.
Abstract
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL), accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy, which may simply be the behavior policy. We show that offline RL algorithms that
Authors
(none)
Tags
Stats
Related papers
- ACL-QL: Adaptive Conservative Level In Q-learning For Offline Reinforcement Learning (2024)0.00
- Improving Offline-to-online Reinforcement Learning With Q Conditioned State Entropy Exploration (2023)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data (2024)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00