Sample Efficient Policy Gradient Methods With Recursive Variance Reduction
2019 Β· Pan Xu, Felicia Gao, Quanquan Gu
Abstract
Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires \(O(1/\epsilon^\{3/2\})\) episodes to find an \(\epsilon\)-approximate stationary point of the nonconcave performance function \(J(\boldsymbol\{\theta\})\) (i.e., \(\boldsymbol\{\theta\}\) such that \(\|\nabla J(\boldsymbol\{\theta\})\|_2^2\leq\epsilon\)). This sample complexity improves the existing result \(O(1/\epsilon^\{5/3\})\) for stochastic variance reduced policy gradient algorithms by a factor of \(O(1/\epsilon^\{1/6\})\). In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed
Authors
(none)
Tags
Stats
Related papers
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- PAGE-PG: A Simple And Loopless Variance-reduced Policy Gradient Method With Probabilistic Gradient Estimation (2022)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Policy Optimization With Stochastic Mirror Descent (2019)7.50
- Efficiently Escaping Saddle Points For Policy Optimization (2023)0.00