Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games
2023 Β· Songtao Feng, Ming Yin, Yu-Xiang Wang, et al.
Abstract
The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an \(\epsilon\)-optimal Nash Equilibrium (NE) with the sample complexity of \(O(H^3SAB/\epsilon^2)\), which is optimal in the dependence of the horizon \(H\) and the number of states \(S\) (where \(A\) and \(B\) denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the \(H\) dependence as model-based algorithms. The main improvement of the dependency on
Authors
(none)
Tags
Stats
Related papers
- A Sharp Analysis Of Model-based Reinforcement Learning With Self-play (2020)0.00
- Model-based Reinforcement Learning For Offline Zero-sum Markov Games (2022)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Learning Zero-sum Linear Quadratic Games With Improved Sample Complexity And Last-iterate Convergence (2023)0.00
- Towards General Function Approximation In Zero-sum Markov Games (2021)0.00
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00