Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-regret Learning In Markov Games
2022 Β· Wenhao Zhan, Jason D. Lee, Zhuoran Yang
Abstract
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. Our goal is to develop a no-regret online learning algorithm that (i) takes actions based on the local information observed by the agent and (ii) is able to find the best policy in hindsight. For such a problem, the nonstationary state transitions due to the varying opponent pose a significant challenge. In light of a recent hardness result \citep\{liu2022learning\}, we focus on the setting where the opponent's previous policies are revealed to the agent for decision making. With such an information structure, we propose a new algorithm, \underline\{D\}ecentralized \underline\{O\}ptimistic hype\underline\{R\}policy m\underline\{I\}rror de\underline\{S\}cent (DORIS), which achieves \(\sqrt\{K\}\)-regret in the context of general function approximation, where \(K\) is the number of episodes. Moreover, when all the agents adopt DORIS, we pro
Authors
(none)
Tags
Stats
Related papers
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Learning In Markov Games With Adaptive Adversaries: Policy Regret, Fundamental Barriers, And Efficient Algorithms (2024)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- Online Learning In Unknown Markov Games (2020)0.00
- Local And Adaptive Mirror Descents In Extensive-form Games (2023)0.00
- Decentralized Model-free Reinforcement Learning In Stochastic Games With Average-reward Objective (2023)0.00
- Optimistic Policy Learning Under Pessimistic Adversaries With Regret And Violation Guarantees (2026)0.00