Policy Optimization For Markov Games: Unified Framework And Faster Convergence
2022 Β· Runyu Zhang, Qinghua Liu, Huan Wang, et al.
Abstract
This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an \(\mathcal\{\widetilde\{O\}\}(T^\{-5/6\})\) approximate NE in \(T\) iterations, and a similar algorithm with sligh
Authors
(none)
Tags
Stats
Related papers
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Policy Optimization For Continuous-time Linear-quadratic Graphon Mean Field Games (2025)0.00
- Provably Efficient Fictitious Play Policy Optimization For Zero-sum Markov Games With Structured Transitions (2022)0.00
- A New Policy Iteration Algorithm For Reinforcement Learning In Zero-sum Markov Games (2023)0.00
- Can We Find Nash Equilibria At A Linear Rate In Markov Games? (2023)0.00
- Optimistic Policy Gradient In Multi-player Markov Games With A Single Controller: Convergence Beyond The Minty Property (2023)3.58
- Independent Policy Gradient For Large-scale Markov Potential Games: Sharper Rates, Function Approximation, And Game-agnostic Convergence (2022)0.00