Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning
2020 Β· Sebastian Curi, Felix Berkenkamp, Andreas Krause
Abstract
Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty. However, while most algorithms distinguish these two uncertainties for learning the model, they ignore it when optimizing the policy, which leads to greedy and insufficient exploration. At the same time, there are no practical solvers for optimistic exploration algorithms. In this paper, we propose a practical optimistic exploration algorithm (H-UCRL). H-UCRL reparameterizes the set of plausible models and hallucinates control directly on the epistemic uncertainty. By augmenting the input space with the hallucinated inputs, H-UCRL can be solved using standard greedy planners. Furthermore, we analyze H-UCRL and construct a general regret bound for well-calibrated models, which is provably sublinear in the case of Gaussian Process models. Based on this t
Authors
(none)
Tags
Stats
Related papers
- Efficient Model-based Multi-agent Reinforcement Learning Via Optimistic Equilibrium Computation (2022)0.00
- Combining Pessimism With Optimism For Robust And Efficient Model-based Deep Reinforcement Learning (2021)0.00
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- Coplanner: Plan To Roll Out Conservatively But To Explore Optimistically For Model-based RL (2023)0.00
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00