Trajectory-wise Control Variates For Variance Reduction In Policy Gradient Methods
2019 Β· Ching-An Cheng, Xinyan Yan, Byron Boots
Abstract
Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a
Authors
(none)
Tags
Stats
Related papers
- Coordinate-wise Control Variates For Deep Policy Gradients (2021)0.00
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- A Simple Mixture Policy Parameterization For Improving Sample Efficiency Of Cvar Optimization (2024)0.00
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26