Can We Find Nash Equilibria At A Linear Rate In Markov Games?
2023 Β· Zhuoqing Song, Jason D. Lee, Zhuoran Yang
Abstract
We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as \(\texttt\{Homotopy-PO\}\), which provably finds a Nash equilibrium at a global linear rate. In particular, \(\texttt\{Homotopy-PO\}\) interweaves two base algorithms \(\texttt\{Local-Fast\}\) and \(\texttt\{Global-Slow\}\) via homotopy continuation. \(\texttt\{Local-Fast\}\) is an algorithm that enjoys local linear convergence while \(\texttt\{Global-Slow\}\) is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, \(\texttt\{Global-Slow\}\) essentially serves as a ``guide'' which identifies a
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- On The Convergence Of Policy Gradient Methods To Nash Equilibria In General Stochastic Games (2022)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Efficiently Computing Nash Equilibria In Adversarial Team Markov Games (2022)0.00
- Learning Equilibria In Adversarial Team Markov Games: A Nonconvex-hidden-concave Min-max Optimization Problem (2024)0.00