How To Fine-tune The Model: Unified Model Shift And Model Bias Policy Optimization
2023 Β· Hai Zhang, Hang Yu, Junqiao Zhao, et al.
Abstract
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward
Authors
(none)
Tags
Stats
Related papers
- When To Update Your Model: Constrained Model-based Reinforcement Learning (2022)2.26
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- Mismatched No More: Joint Model-policy Optimization For Model-based RL (2021)0.00
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- The Virtues Of Laziness In Model-based RL: A Unified Objective And Algorithms (2023)0.00
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- Conservative Dual Policy Optimization For Efficient Model-based Reinforcement Learning (2022)0.00
- Mitigating Distribution Shift In Model-based Offline RL Via Shifts-aware Reward Learning (2024)0.00