Improving Offline-to-online Reinforcement Learning With Q Conditioned State Entropy Exploration
2023 Β· Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, et al.
Abstract
Studying how to fine-tune offline reinforcement learning (RL) pre-trained policy is profoundly significant for enhancing the sample efficiency of RL algorithms. However, directly fine-tuning pre-trained policies often results in sub-optimal performance. This is primarily due to the distribution shift between offline pre-training and online fine-tuning stages. Specifically, the distribution shift limits the acquisition of effective online samples, ultimately impacting the online fine-tuning performance. In order to narrow down the distribution shift between offline and online stages, we proposed Q conditioned state entropy (QCSE) as intrinsic reward. Specifically, QCSE maximizes the state entropy of all samples individually, considering their respective Q values. This approach encourages exploration of low-frequency samples while penalizing high-frequency ones, and implicitly achieves State Marginal Matching (SMM), thereby ensuring optimal performance, solving the asymptotic sub-optimal
Authors
(none)
Tags
Stats
Related papers
- Cal-ql: Calibrated Offline RL Pre-training For Efficient Online Fine-tuning (2023)2.26
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- State-constrained Offline Reinforcement Learning (2024)0.00
- Uncertainty-based Offline Reinforcement Learning With Diversified Q-ensemble (2021)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- Emaq: Expected-max Q-learning Operator For Simple Yet Effective Offline And Online RL (2020)0.00
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00