Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions
2023 Β· Reda Ouhamma, Maryam Kamgarpour
Abstract
We address payoff-based decentralized learning in infinite-horizon zero-sum Markov games. In this setting, each player makes decisions based solely on received rewards, without observing the opponent's strategy or actions nor sharing information. Prior works established finite-time convergence to an approximate Nash equilibrium under strong reachability and mixing time assumptions. We propose a convergent algorithm that significantly relaxes these assumptions, requiring only the existence of a single policy (not necessarily known) with bounded reachability and mixing time. Our key technical novelty is introducing Tsallis entropy regularization to smooth the best-response policy updates. By suitably tuning this regularization, we ensure sufficient exploration, thus bypassing previous stringent assumptions on the MDP. By establishing novel properties of the value and policy updates induced by the Tsallis entropy regularizer, we prove finite-time convergence to an approximate Nash equilib
Authors
(none)
Tags
Stats
Related papers
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- Can We Find Nash Equilibria At A Linear Rate In Markov Games? (2023)0.00
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00
- Convergence Of Heterogeneous Learning Dynamics In Zero-sum Stochastic Games (2023)2.26
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- A Finite-sample Analysis Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2023)0.00
- Decentralized Q-learning In Zero-sum Markov Games (2021)0.00