Partial-information Q-learning For General Two-player Stochastic Games
2023 Β· Negash Medhin, Andrew Papanicolaou, Marwen Zrida
Abstract
In this article we analyze a partial-information Nash Q-learning algorithm for a general 2-player stochastic game. Partial information refers to the setting where a player does not know the strategy or the actions taken by the opposing player. We prove convergence of this partially informed algorithm for general 2-player games with finitely many states and actions, and we confirm that the limiting strategy is in fact a full-information Nash equilibrium. In implementation, partial information offers simplicity because it avoids computation of Nash equilibria at every time step. In contrast, full-information Q-learning uses the Lemke-Howson algorithm to compute Nash equilibria at every time step, which can be an effective approach but requires several assumptions to prove convergence and may have runtime error if Lemke-Howson encounters degeneracy. In simulations, the partial information results we obtain are comparable to those for full-information Q-learning and fictitious play.
Authors
(none)
Tags
Stats
Related papers
- Feature-based Q-learning For Two-player Stochastic Games (2019)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- On Information Asymmetry In Competitive Multi-agent Reinforcement Learning: Convergence And Optimality (2020)0.00
- Generalized Individual Q-learning For Polymatrix Games With Partial Observations (2024)2.26
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00
- On The Convergence Of Policy Gradient Methods To Nash Equilibria In General Stochastic Games (2022)0.00
- Balancing Two-player Stochastic Games With Soft Q-learning (2018)0.00
- Decentralized Policy Gradient For Nash Equilibria Learning Of General-sum Stochastic Games (2022)0.00