Local And Adaptive Mirror Descents In Extensive-form Games

Abstract

We study how to learn \(\epsilon\)-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by \(T\). Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of \(\tilde\{\mathcal\{O\}\}(T^\{-1/2\})\) with high probability and has a near-optimal dependence on the game parameters when applied wi

Local And Adaptive Mirror Descents In Extensive-form Games

Abstract

Authors

Tags

Stats

Related papers