On The Global Convergence Rates Of Decentralized Softmax Gradient Play In Markov Potential Games
2022 Β· Runyu Zhang, Jincheng Mei, Bo Dai, et al.
Abstract
Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without \(log\)-barrier regularization. The established conv
Authors
(none)
Tags
Stats
Related papers
- Convergence And Price Of Anarchy Guarantees Of The Softmax Policy Gradient In Markov Potential Games (2022)0.00
- Global Convergence Of Multi-agent Policy Gradient In Markov Potential Games (2021)0.00
- Independent Natural Policy Gradient Always Converges In Markov Potential Games (2021)0.00
- Independent Policy Gradient For Large-scale Markov Potential Games: Sharper Rates, Function Approximation, And Game-agnostic Convergence (2022)0.00
- On The Global Convergence Rates Of Softmax Policy Gradient Methods (2020)0.00
- Softmax Policy Gradient Methods Can Take Exponential Time To Converge (2021)6.34
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- Independent Learning In Constrained Markov Potential Games (2024)0.00