Abstract

The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an \(\epsilon\)-optimal Nash Equilibrium (NE) with the sample complexity of \(O(H^3SAB/\epsilon^2)\), which is optimal in the dependence of the horizon \(H\) and the number of states \(S\) (where \(A\) and \(B\) denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the \(H\) dependence as model-based algorithms. The main improvement of the dependency on

Authors

(none)

Tags

  • Model-Based RL
  • Multi-Agent
  • Game AI

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyfeng2023improving

Related papers