A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games
2019 Β· Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar
Abstract
We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation that has been successfully applied in the literature to compute a faster value iteration algorithm in the context of Markov Decision Processes. We extend the concept of successive relaxation to the setting of two-player zero-sum games. We show that, under a special structure on the game, this technique facilitates faster computation of the min-max value of the states. We then derive a generalized minimax Q-learning algorithm that computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm utilizing stochastic approximation t
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Minimax Q-learning For Two-player Zero-sum Markov Games: Switching System Approach (2023)0.00
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00
- Partial-information Q-learning For General Two-player Stochastic Games (2023)0.00
- Towards General Function Approximation In Zero-sum Markov Games (2021)0.00
- A New Policy Iteration Algorithm For Reinforcement Learning In Zero-sum Markov Games (2023)0.00
- Feature-based Q-learning For Two-player Stochastic Games (2019)0.00
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- FM3Q: Factorized Multi-agent Minimax Q-learning For Two-team Zero-sum Markov Game (2024)6.34