Value Gradient Weighted Model-based Reinforcement Learning
2022 Β· Claas Voelcker, Victor Liao, Animesh Garg, et al.
Abstract
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the perfo
Authors
(none)
Tags
Stats
Related papers
- Policy-aware Model Learning For Policy Gradient Methods (2020)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- How To Fine-tune The Model: Unified Model Shift And Model Bias Policy Optimization (2023)0.00
- Model Imitation For Model-based Reinforcement Learning (2019)0.00
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Objective Mismatch In Model-based Reinforcement Learning (2020)0.00
- The Value Equivalence Principle For Model-based Reinforcement Learning (2020)0.00
- Gradient-aware Model-based Policy Search (2019)6.77