Policy Mirror Ascent For Efficient And Independent Learning In Mean Field Games

Abstract

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous \(N\)-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the \(N\)-player game. Instead, we show that \(N\) agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within \(\widetilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) samples from a single sample trajectory without a population generative model, up to a standard \(\mathcal\{O\}(\frac\{1\}\{\sqrt\{N\}\})\) error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive o

Policy Mirror Ascent For Efficient And Independent Learning In Mean Field Games

Abstract

Authors

Tags

Stats

Related papers