Multi-agent Online Learning In Time-varying Games
2018 Β· Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl, et al.
Abstract
We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.
Authors
(none)
Tags
Stats
Related papers
- Finite-time Last-iterate Convergence For Multi-agent Learning In Games (2020)0.00
- Convergence Analysis Of Gradient-based Learning With Non-uniform Learning Rates In Non-cooperative Multi-agent Settings (2019)0.00
- Asymptotic Convergence And Performance Of Multi-agent Q-learning Dynamics (2023)0.00
- Local And Adaptive Mirror Descents In Extensive-form Games (2023)0.00
- On The Stability Of Learning In Network Games With Many Players (2024)0.00
- Policy Gradient With Self-attention For Model-free Distributed Nonlinear Multi-agent Games (2025)0.00
- Policy Mirror Ascent For Efficient And Independent Learning In Mean Field Games (2022)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00