A Sharp Analysis Of Model-based Reinforcement Learning With Self-play
2020 Β· Qinghua Liu, Tiancheng Yu, Yu Bai, et al.
Abstract
Model-based algorithms -- algorithms that explore the environment through building and utilizing an estimated model -- are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for single-agent reinforcement learning in Markov Decision Processes (MDPs). However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches. In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games that is able to output an \(\epsilon\)-approximate Nash policy in \(\tilde\{\mathcal\{O\}\}(H^3SAB/\epsilon^2)\) episodes of game playing, where \(S\) is the number of states, \(A,B\) are the number of actions for the two players respectively, and \(H\) is the
Authors
(none)
Tags
Stats
Related papers
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Efficient Competitive Self-play Policy Optimization (2020)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Model-based Reinforcement Learning For Offline Zero-sum Markov Games (2022)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Provably Efficient Information-directed Sampling Algorithms For Multi-agent Reinforcement Learning (2024)2.26
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00