Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games
2023 Β· Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, et al.
Abstract
We consider two-player zero-sum stochastic games and propose a two-timescale \(Q\)-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale \(Q\)-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) are updated by taking a convex combination between its previous iterate and the latest fast-timescale iterate. Introducing the slow timescale as well as its update equation marks as our main algorithmic novelty. In the special case of linear function approximation, we establish, to the best of our knowledge, the first last-iterate finite-sample bound for payoff-based independent learning dynamics of these types. The result implies a polynomial sample complexity to find a Nash equilibrium in such stochastic games. To establish the results, we model our proposed algorithm as a two-timescale
Authors
(none)
Tags
Stats
Related papers
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- Feature-based Q-learning For Two-player Stochastic Games (2019)0.00
- Learning Nash Equilibria In Zero-sum Stochastic Games Via Entropy-regularized Policy Approximation (2020)0.00
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- Fictitious Play In Zero-sum Stochastic Games (2020)0.00
- Towards General Function Approximation In Zero-sum Markov Games (2021)0.00
- A Finite-sample Analysis Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2023)0.00