Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games
2022 Β· Shicong Cen, Yuejie Chi, Simon S. Du, et al.
Abstract
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications. While there has been substantial progress on understanding the global convergence of policy optimization methods in single-agent RL, designing and analysis of efficient policy optimization algorithms in the MARL setting present significant challenges, which unfortunately, remain highly inadequately addressed by existing theory. In this paper, we focus on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and study equilibrium finding algorithms in both the infinite-horizon discounted setting and the finite-horizon episodic setting. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method and the value is updated on a slower t
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- Cooperative Multi-agent Reinforcement Learning With Partial Observations (2020)10.35
- Decentralized Q-learning In Zero-sum Markov Games (2021)0.00
- Maximum Entropy Heterogeneous-agent Reinforcement Learning (2023)0.00
- Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts (2021)0.00
- Global Convergence Of Localized Policy Iteration In Networked Multi-agent Reinforcement Learning (2022)2.26
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00