Coplanner: Plan To Roll Out Conservatively But To Explore Optimistically For Model-based RL
2023 Β· Xiyao Wang, Ruijie Zheng, Yanchao Sun, et al.
Abstract
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose \(\texttt\{COPlanner\}\), a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. \(\texttt\{COPlanner\}\) leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, \(\texttt\{CO
Authors
(none)
Tags
Stats
Related papers
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy (2022)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Conservative Dual Policy Optimization For Efficient Model-based Reinforcement Learning (2022)0.00
- Double Horizon Model-based Policy Optimization (2025)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- A Kl-regularization Framework For Learning To Plan With Adaptive Priors (2025)0.00