A Finite-sample Analysis Of Payoff-based Independent Learning In Zero-sum Stochastic Games
2023 Β· Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, et al.
Abstract
We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known \(\tilde\{\mathcal\{O\}\}(1/\epsilon^2)\) sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper \(\tilde\{\mathcal\{O\}\}(1/\epsilon)\) sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.
Authors
(none)
Tags
Stats
Related papers
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00
- Actor-dual-critic Dynamics For Zero-sum And Identical-interest Stochastic Games (2026)0.00
- Convergence Of Heterogeneous Learning Dynamics In Zero-sum Stochastic Games (2023)2.26
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Best-response Dynamics And Fictitious Play In Identical-interest And Zero-sum Stochastic Games (2021)0.00
- Episodic Logit-q Dynamics For Efficient Learning In Stochastic Teams (2022)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03