On The Global Convergence Rates Of Softmax Policy Gradient Methods
2020 Β· Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, et al.
Abstract
We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a \(O(1/t)\) rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a \L\{\}ojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate \(O(e^\{-c \cdot t\})\) toward softmax optimal policy \((c > 0)\). This result resolves an open question in the recent literature. Finally, combining the above two results and additional new \(Ξ©(1/t)\) lower bound results, we explain how entropy regularization improves policy optimiz
Authors
(none)
Tags
Stats
Related papers
- Softmax Policy Gradient Methods Can Take Exponential Time To Converge (2021)6.34
- On The Global Convergence Rates Of Decentralized Softmax Gradient Play In Markov Potential Games (2022)0.00
- Rethinking The Global Convergence Of Softmax Policy Gradient With Linear Function Approximation (2025)0.00
- On Global Convergence Rates For Federated Softmax Policy Gradient Under Heterogeneous Environments (2025)0.00
- Global Optimality Of Softmax Policy Gradient With Single Hidden Layer Neural Networks In The Mean-field Regime (2020)0.00
- Elementary Analysis Of Policy Gradient Methods (2024)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58