Convergence And Price Of Anarchy Guarantees Of The Softmax Policy Gradient In Markov Potential Games
2022 Β· Dingyang Chen, Qi Zhang, Thinh T. Doan
Abstract
We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks. We first show the asymptotic convergence of this method to a Nash equilibrium of MPGs for tabular softmax policies. Second, we derive the finite-time performance of the policy gradient in two settings: 1) using the log-barrier regularization, and 2) using the natural policy gradient under the best-response dynamics (NPG-BR). Finally, extending the notion of price of anarchy (POA) and smoothness in normal-form games, we introduce the PO
Authors
(none)
Tags
Stats
Related papers
- On The Global Convergence Rates Of Decentralized Softmax Gradient Play In Markov Potential Games (2022)0.00
- Independent Policy Gradient For Large-scale Markov Potential Games: Sharper Rates, Function Approximation, And Game-agnostic Convergence (2022)0.00
- Global Convergence Of Multi-agent Policy Gradient In Markov Potential Games (2021)0.00
- Independent Natural Policy Gradient Always Converges In Markov Potential Games (2021)0.00
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- Matryoshka Policy Gradient For Entropy-regularized RL: Convergence And Global Optimality (2023)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Softmax Policy Gradient Methods Can Take Exponential Time To Converge (2021)6.34