Fictitious Play In Zero-sum Stochastic Games
2020 Β· Muhammed O. Sayin, Francesca Parise, Asuman Ozdaglar
Abstract
We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
Authors
(none)
Tags
Stats
Related papers
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00
- Best-response Dynamics And Fictitious Play In Identical-interest And Zero-sum Stochastic Games (2021)0.00
- On The Global Convergence Of Stochastic Fictitious Play In Stochastic Games With Turn-based Controllers (2022)0.00
- Fictitious Play In Markov Games With Single Controller (2022)6.77
- Actor-dual-critic Dynamics For Zero-sum And Identical-interest Stochastic Games (2026)0.00
- Convergence Of Heterogeneous Learning Dynamics In Zero-sum Stochastic Games (2023)2.26
- Anticipatory Fictitious Play (2022)0.00
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00