Efficiently Escaping Saddle Points For Policy Optimization
2023 Β· Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, et al.
Abstract
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of \(O(\epsilon^\{-3\})\). However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of \(\tilde\{O\}(\epsilon^\{-3\})\). This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of \(O(\epsilon^\{-0.5\})\). Moreover, the proposed variance reduction technique bypasses IS weight
Authors
(none)
Tags
Stats
Related papers
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Sample Complexity Of Policy Gradient Finding Second-order Stationary Points (2020)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- PAGE-PG: A Simple And Loopless Variance-reduced Policy Gradient Method With Probabilistic Gradient Estimation (2022)0.00
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- A Hybrid Stochastic Policy Gradient Algorithm For Reinforcement Learning (2020)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00