Abstract

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zero-sum Markov games but is not applicable to general-sum Markov games. It is known that the best-response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibria in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of *sample complexity* for model-based MARL algorithms in general-sum Markov games. We show two results. We first use Hoeffding inequality based bounds to show that \(\tilde\{\mathcal\{O\}\}( (1-\gamma)^\{-4\} \alpha^\{-2\})\) samples per state-action pair are sufficient to obtain a \(\alpha

Authors

(none)

Tags

  • Model-Based RL
  • Multi-Agent
  • Game AI
  • Value-Based

Stats

Related papers