Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization
2020 Β· Shicong Cen, Chen Cheng, Yuxin Chen, et al.
Abstract
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops \(\textit\{non-asymptotic\}\) convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis inexactness of po
Authors
(none)
Tags
Stats
Related papers
- Linear Convergence Of Entropy-regularized Natural Policy Gradient With Linear Function Approximation (2021)6.34
- Linear Convergence Of Independent Natural Policy Gradient In Games With Entropy Regularization (2024)3.58
- Matryoshka Policy Gradient For Entropy-regularized RL: Convergence And Global Optimality (2023)0.00
- Beyond Exact Gradients: Convergence Of Stochastic Soft-max Policy Gradient Methods With Entropy Regularization (2021)2.26
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction (2024)0.00
- Convergence Of Policy Gradient For Entropy Regularized Mdps With Neural Network Approximation In The Mean-field Regime (2022)0.00