Abstract

This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit\{suboptimality gap\}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an \(\epsilon\)-Nash Equilibrium (NE) within \(\mathcal\{O\}(1/\epsilon)\) iterations. This improves upon the previous best result of \(\mathcal\{O\}(1/\epsilon^2)\) iterations and is of the same order, \(\mathcal\{O\}(1/\epsilon)\), that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.

Authors

(none)

Tags

  • Policy Gradient
  • Multi-Agent
  • Game AI

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keysun2023provably

Related papers