All-action Policy Gradient Methods: A Numerical Integration Approach
2019 Β· Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, et al.
Abstract
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous control tasks with nonlinear function approximation. Our results show improved performance and sample
Authors
(none)
Tags
Stats
Related papers
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Marginal Policy Gradients: A Unified Family Of Estimators For Bounded Action Spaces With Applications (2018)0.00
- Policy Gradient Methods For Reinforcement Learning With Function Approximation And Action-dependent Baselines (2017)0.00
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- An Off-policy Policy Gradient Theorem Using Emphatic Weightings (2018)0.00