Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction
2024 Β· Jie Feng, Ke Wei, Jinchi Chen
Abstract
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate \(\epsilon\)-optimality with a sample complexity of \(\mathcal\{O\}(\epsilon^\{-2\})\), which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art
Authors
(none)
Tags
Stats
Related papers
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- Stochastic Policy Gradient Methods: Improved Sample Complexity For Fisher-non-degenerate Policies (2023)0.00
- Improved Sample Complexity Analysis Of Natural Policy Gradient Algorithm With General Parameterization For Infinite Horizon Discounted Reward Markov Decision Processes (2023)0.00
- Linear Convergence Of Entropy-regularized Natural Policy Gradient With Linear Function Approximation (2021)6.34
- Improved Communication Efficiency In Federated Natural Policy Gradient Via Admm-based Gradient Updates (2023)0.00