Abstract

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with \(\max_\{i\in[m]\} A_i\), where \(A_i\) is the number of actions for the \(i^\{\rm th\}\) player. This is in sharp contrast to the size of the joint action space which is \(\prod_\{i=1\}^m A_i\). V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm wit

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyjin2021v

Related papers