Gradient-aware Model-based Policy Search
2019 Β· Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, et al.
Abstract
Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the c
Authors
(none)
Tags
Stats
Related papers
- Policy-aware Model Learning For Policy Gradient Methods (2020)0.00
- Model-free Policy Learning With Reward Gradients (2021)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model (2021)0.00
- PIPPS: Flexible Model-based Policy Search Robust To The Curse Of Chaos (2019)0.00
- On-policy Model Errors In Reinforcement Learning (2021)0.00
- Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective (2024)0.00