Generalized Bandit Regret Minimizer Framework In Imperfect Information Extensive-form Game
2022 Β· Linjian Meng, Yang Gao
Abstract
Regret minimization methods are a powerful tool for learning approximate Nash equilibrium (NE) in two-player zero-sum imperfect information extensive-form games (IIEGs). We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG. In general, only the interactive trajectory and the reached terminal node value \(v(z^t)\) are revealed. To learn NE, the regret minimizer is required to estimate the full-feedback loss gradient \(\ell^t\) by \(v(z^t)\) and minimize the regret. In this paper, we propose a generalized framework for this learning setting. It presents a theoretical framework for the design and the modular analysis of the bandit regret minimization methods. We demonstrate that the most recent bandit regret minimization methods can be analyzed as a particular case of our framework. Following this framework, we describe a novel method SIX-OMD to learn approximate NE. It is model-free and extremely improves the best existing conver
Authors
(none)
Tags
Stats
Related papers
- Adversarial Learning In Games With Bandit Feedback: Logarithmic Pure-strategy Maximin Regret (2026)0.00
- Model-free Learning For Two-player Zero-sum Partially Observable Markov Games With Perfect Recall (2021)0.00
- Best Of Both Worlds: Regret Minimization Versus Minimax Play (2025)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Sample-efficient Learning Of Correlated Equilibria In Extensive-form Games (2022)0.00
- Model-free Online Learning In Unknown Sequential Decision Making Problems And Games (2021)5.24
- Evolutionary Dynamics And \(\phi\)-regret Minimization In Games (2021)3.58
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00