Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games

Abstract

This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit\{suboptimality gap\}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an \(\epsilon\)-Nash Equilibrium (NE) within \(\mathcal\{O\}(1/\epsilon)\) iterations. This improves upon the previous best result of \(\mathcal\{O\}(1/\epsilon^2)\) iterations and is of the same order, \(\mathcal\{O\}(1/\epsilon)\), that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.

Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games

Abstract

Authors

Tags

Stats

Related papers