Local And Adaptive Mirror Descents In Extensive-form Games
2023 · Côme Fiegel, Pierre Ménard, Tadashi Kozuno, et al.
Abstract
We study how to learn \(\epsilon\)-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by \(T\). Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of \(\tilde\{\mathcal\{O\}\}(T^\{-1/2\})\) with high probability and has a near-optimal dependence on the game parameters when applied wi
Authors
(none)
Tags
Stats
Related papers
- Policy Mirror Ascent For Efficient And Independent Learning In Mean Field Games (2022)0.00
- Model-free Learning For Two-player Zero-sum Partially Observable Markov Games With Perfect Recall (2021)0.00
- Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-regret Learning In Markov Games (2022)0.00
- Multi-agent Online Learning In Time-varying Games (2018)8.82
- Policy Mirror Descent With Temporal Difference Learning: Sample Complexity Under Online Markov Data (2025)0.00
- Independent Policy Mirror Descent For Markov Potential Games: Scaling To Large Number Of Players (2024)0.00
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- Policy Mirror Descent Inherently Explores Action Space (2023)2.26