Abstract

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an \(\epsilon\)-approximate CCE in at most \(\widetilde\{O\}( H^6S A /\epsilon^2)\) episodes, where \(S\) is the number of states, \(A\) is the size of the largest individual action space, and \(H\) is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov

Authors

(none)

Tags

  • Multi-Agent
  • Game AI

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keymao2021provably

Related papers