Abstract

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a \(\gamma\)-discounted infinite-horizon Markov game with \(S\) states, where the max-player has \(A\) actions and the min-player has \(B\) actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an \(\epsilon\)-approximate Nash equilibrium with a sample complexity no larger than \(\frac\{C_\{\mathsf\{clipped\}\}^\{\star\}S(A+B)\}\{(1-\gamma)^\{3\}\epsilon^\{2\}\}\) (up to some log factor). Here, \(C_\{\mathsf\{clipped\}\}^\{\star\}\) is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy \(\epsilon\) can be any value within \(\big(0,\frac\{1\}\{1-\gamma\}\big]\). Our sample complexity bound strengthens prior art by a factor of \(\min\\{A

Authors

(none)

Tags

  • Model-Based RL
  • Offline RL
  • Game AI
  • Value-Based

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyyan2022model

Related papers