Learning To Combat Compounding-error In Model-based Reinforcement Learning
2019 Β· Chenjun Xiao, Yifan Wu, Chen Ma, et al.
Abstract
Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for choosing the planning horizon on a state-dependent basis, where a state's planning horizon is determined by the maximum cumulative model error around that state. We demonstrate that these state-dependent model errors can be learned with Temporal Difference methods, based on a novel approach of temporally decomposing the cumulative model errors. Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.
Authors
(none)
Tags
Stats
Related papers
- Learning With Imperfect Models: When Multi-step Prediction Mitigates Compounding Error (2025)0.00
- Self-correcting Models For Model-based Reinforcement Learning (2016)0.00
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- On-policy Model Errors In Reinforcement Learning (2021)0.00
- A Note On Loss Functions And Error Compounding In Model-based Reinforcement Learning (2024)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Learning Dynamics Model In Reinforcement Learning By Incorporating The Long Term Future (2019)0.00
- Temporal Difference Models: Model-free Deep RL For Model-based Control (2018)0.00