On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method
2021 Β· Junyu Zhang, Chengzhuo Ni, Zheng Yu, et al.
Abstract
Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the *variance reduction* techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-3\})\) sample complexity for TSIVR-PG to find an \(\epsilon\)-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- PAGE-PG: A Simple And Loopless Variance-reduced Policy Gradient Method With Probabilistic Gradient Estimation (2022)0.00
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Efficiently Escaping Saddle Points For Policy Optimization (2023)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Reusing Trajectories In Policy Gradients Enables Fast Convergence (2025)0.00