Policy-aware Model Learning For Policy Gradient Methods
2020 Β· Romina Abachi, Mohammad Ghavamzadeh, Amir-Massoud Farahmand
Abstract
This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner. We focus on policy gradient type of planning algorithms and derive new loss functions for model learning that incorporate how the planner uses the model. We call this approach Policy-Aware Model Learning (PAML). We theoretically analyze a generic model-based policy gradient algorithm and provide a convergence guarantee for the optimized policy. We also empirically evaluate PAML on some benchmark problems, showing promising results.
Authors
(none)
Tags
Stats
Related papers
- Value Gradient Weighted Model-based Reinforcement Learning (2022)0.00
- Gradient-aware Model-based Policy Search (2019)6.77
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model (2021)0.00
- A Kl-regularization Framework For Learning To Plan With Adaptive Priors (2025)0.00
- How To Fine-tune The Model: Unified Model Shift And Model Bias Policy Optimization (2023)0.00
- On-policy Model Errors In Reinforcement Learning (2021)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00