Provably Efficient Fictitious Play Policy Optimization For Zero-sum Markov Games With Structured Transitions
2022 Β· Shuang Qiu, Xiaohan Wei, Jieping Ye, et al.
Abstract
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight \(\widetilde\{\mathcal\{O\}\}(\sqrt\{K\})\) regret bounds after \(K\) episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Efficient Competitive Self-play Policy Optimization (2020)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Fictitious Play In Markov Games With Single Controller (2022)6.77
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Optimistic Policy Gradient In Multi-player Markov Games With A Single Controller: Convergence Beyond The Minty Property (2023)3.58