On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning
2020 Β· Brandon Amos, Samuel Stanton, Denis Yarats, et al.
Abstract
For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have proposed model-based agents with increasingly complex components, from ensembles of probabilistic dynamics models, to heuristics for mitigating model error. In a reversal of this trend, we show that simple model-based agents can be derived from existing ideas that not only match, but outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward. We find that a model-free soft value estimate for policy evaluation and a model-based stochastic value gradient for policy improvement is an effective combination, achieving state-of-the-art results on a high-dimensiona
Authors
(none)
Tags
Stats
Related papers
- Model-free Policy Learning With Reward Gradients (2021)0.00
- Value Gradient Weighted Model-based Reinforcement Learning (2022)0.00
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00
- Sample-efficient Reinforcement Learning With Stochastic Ensemble Value Expansion (2018)0.00
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Deterministic Value-policy Gradients (2019)0.00