Adversarial Learning In Games With Bandit Feedback: Logarithmic Pure-strategy Maximin Regret
2026 Β· Shinji Ito, Haipeng Luo, Arnab Maiti, et al.
Abstract
Learning to play zero-sum games is a fundamental problem in game theory and machine learning. While significant progress has been made in minimizing external regret in the self-play settings or with full-information feedback, real-world applications often force learners to play against unknown, arbitrary opponents and restrict learners to bandit feedback where only the payoff of the realized action is observable. In such challenging settings, it is well-known that \(Ξ©(\sqrt\{T\})\) external regret is unavoidable (where T is the number of rounds). To overcome this barrier, we investigate adversarial learning in zero-sum games under bandit feedback, aiming to minimize the deficit against the maximin pure strategy -- a metric we term Pure-Strategy Maximin Regret. We analyze this problem under two bandit feedback models: uninformed (only the realized reward is revealed) and informed (both the reward and the opponent's action are revealed). For uninformed bandit learning of normal-form ga
Authors
(none)
Tags
Stats
Related papers
- Best Of Both Worlds: Regret Minimization Versus Minimax Play (2025)0.00
- Generalized Bandit Regret Minimizer Framework In Imperfect Information Extensive-form Game (2022)0.00
- Towards Optimal Regret In Adversarial Linear Mdps With Bandit Feedback (2023)0.00
- The Harder Path: Last Iterate Convergence For Uncoupled Learning In Zero-sum Games With Bandit Feedback (2026)0.00
- No-regret Learning In Unknown Games With Correlated Payoffs (2019)0.00
- Online Learning In Unknown Markov Games (2020)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Online Markov Decision Processes With Aggregate Bandit Feedback (2021)0.00