Self-correcting Models For Model-based Reinforcement Learning
2016 Β· Erik Talvitie
Abstract
When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.
Authors
(none)
Tags
Stats
Related papers
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Learning The Reward Function For A Misspecified Model (2018)0.00
- Acting Upon Imagination: When To Trust Imagined Trajectories In Model Based Reinforcement Learning (2021)0.00
- Learning To Combat Compounding-error In Model-based Reinforcement Learning (2019)0.00
- When To Update Your Model: Constrained Model-based Reinforcement Learning (2022)2.26
- Planning With Exploration: Addressing Dynamics Bottleneck In Model-based Reinforcement Learning (2020)0.00
- An Analysis Of Model-based Reinforcement Learning From Abstracted Observations (2022)0.00
- Objective Mismatch In Model-based Reinforcement Learning (2020)0.00