Abstract

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous \(N\)-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the \(N\)-player game. Instead, we show that \(N\) agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within \(\widetilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) samples from a single sample trajectory without a population generative model, up to a standard \(\mathcal\{O\}(\frac\{1\}\{\sqrt\{N\}\})\) error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive o

Authors

(none)

Tags

  • Game AI

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyyardim2022policy

Related papers