Model-based Reinforcement Learning For Offline Zero-sum Markov Games
2022 Β· Yuling Yan, Gen Li, Yuxin Chen, et al.
Abstract
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a \(\gamma\)-discounted infinite-horizon Markov game with \(S\) states, where the max-player has \(A\) actions and the min-player has \(B\) actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an \(\epsilon\)-approximate Nash equilibrium with a sample complexity no larger than \(\frac\{C_\{\mathsf\{clipped\}\}^\{\star\}S(A+B)\}\{(1-\gamma)^\{3\}\epsilon^\{2\}\}\) (up to some log factor). Here, \(C_\{\mathsf\{clipped\}\}^\{\star\}\) is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy \(\epsilon\) can be any value within \(\big(0,\frac\{1\}\{1-\gamma\}\big]\). Our sample complexity bound strengthens prior art by a factor of \(\min\\{A
Authors
(none)
Tags
Stats
Related papers
- Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning From Offline Datasets (2022)0.00
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- Model-free Learning For Two-player Zero-sum Partially Observable Markov Games With Perfect Recall (2021)0.00
- Corruption-robust Offline Two-player Zero-sum Markov Games (2024)0.00
- A Sharp Analysis Of Model-based Reinforcement Learning With Self-play (2020)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Pessimism-free Offline Learning In General-sum Games Via KL Regularization (2026)0.00
- Can We Find Nash Equilibria At A Linear Rate In Markov Games? (2023)0.00