Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games
2024 Β· Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, et al.
Abstract
In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorporation of the minimax value iteration. To our knowledge, our theoretical results present the first finite-sample analysis of such learning dynamics with last-iterate guarantees. In the matrix game setting, the results imply a sample complexity of \(O(\epsilon^\{-1\})\) to find the Nash distribution and a sample complexity of \(O(\epsilon^\{-8\})\) to find a Nash equilibrium. In the stochastic game setting, the results also imply a sample complexity of \(O(\epsilon^\{-8\})\) to find a Nash equilibrium. To establish these results, the main challenge is to handle stochastic approximation algorithm
Authors
(none)
Tags
Stats
Related papers
- A Finite-sample Analysis Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2023)0.00
- The Harder Path: Last Iterate Convergence For Uncoupled Learning In Zero-sum Games With Bandit Feedback (2026)0.00
- Convergence Of Heterogeneous Learning Dynamics In Zero-sum Stochastic Games (2023)2.26
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Actor-dual-critic Dynamics For Zero-sum And Identical-interest Stochastic Games (2026)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- Teaching An Old Dynamics New Tricks: Regularization-free Last-iterate Convergence In Zero-sum Games Via BNN Dynamics (2026)0.00