Model-based Reinforcement Learning For Offline Zero-sum Markov Games

Abstract

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a \(\gamma\)-discounted infinite-horizon Markov game with \(S\) states, where the max-player has \(A\) actions and the min-player has \(B\) actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an \(\epsilon\)-approximate Nash equilibrium with a sample complexity no larger than \(\frac\{C_\{\mathsf\{clipped\}\}^\{\star\}S(A+B)\}\{(1-\gamma)^\{3\}\epsilon^\{2\}\}\) (up to some log factor). Here, \(C_\{\mathsf\{clipped\}\}^\{\star\}\) is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy \(\epsilon\) can be any value within \(\big(0,\frac\{1\}\{1-\gamma\}\big]\). Our sample complexity bound strengthens prior art by a factor of \(\min\\{A

Model-based Reinforcement Learning For Offline Zero-sum Markov Games

Abstract

Authors

Tags

Stats

Related papers