Elementary Analysis Of Policy Gradient Methods
2024 Β· Jiacai Liu, Wenye Li, Ke Wei
Abstract
Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent activities in studying these algorithms from the theoretical aspect. Despite this, their convergence behavior is still not fully understood, even given the access to exact policy evaluations. In this paper, we focus on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods. Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing resul
Authors
(none)
Tags
Stats
Related papers
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Softmax Policy Gradient Methods Can Take Exponential Time To Converge (2021)6.34
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- On The Global Convergence Rates Of Decentralized Softmax Gradient Play In Markov Potential Games (2022)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- On The Global Convergence Rates Of Softmax Policy Gradient Methods (2020)0.00
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00