Feature-based Q-learning For Two-player Stochastic Games
2019 Β· Zeyu Jia, Lin F. Yang, Mengdi Wang
Abstract
Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm is shown to find an \(\epsilon\)-optimal strategy using sample size linear to the number of features. To further improve its sample efficiency, we develop an accelerated algorithm by adopting techniques such as variance reduction, monotonicity preservation and two-sided strategy approximation. We prove that the algorithm is guaranteed to find an \(\epsilon\)-optimal strategy using no more than \(\tilde\{\mathcal\{O\}\}(K/(\epsilon^\{2\}(1-\gamma)^\{4\}))\) samples with high probability, where \(K\) is the number of features and \(\gamma\) is a discount factor. The sample, time and space complexities of the algorithm are independent of original dimensions of the game.
Authors
(none)
Tags
Stats
Related papers
- Partial-information Q-learning For General Two-player Stochastic Games (2023)0.00
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- Solving Discounted Stochastic Two-player Games With Near-optimal Time And Sample Complexity (2019)0.00
- Balancing Two-player Stochastic Games With Soft Q-learning (2018)0.00
- Learning Nash Equilibria In Zero-sum Stochastic Games Via Entropy-regularized Policy Approximation (2020)0.00
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- On The Heterogeneity Of Independent Learning Dynamics In Zero-sum Stochastic Games (2021)0.00