An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients
2021 Β· JoΓ£o Carvalho, Davide Tateo, Fabio Muratore, et al.
Abstract
Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both
Authors
(none)
Tags
Stats
Related papers
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26
- Model-free Policy Learning With Reward Gradients (2021)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- Unifying Gradient Estimators For Meta-reinforcement Learning Via Off-policy Evaluation (2021)0.00
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00