Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning
2023 Β· Yulai Zhao, Zhuoran Yang, Zhaoran Wang, et al.
Abstract
Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent direction for each local policy. Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate. We extend our algorithm to the off-policy setting and introduce pessimism to policy evaluation, which aligns with experiments. To our knowledge, this is the first provably convergent multi-agent PPO algorithm in cooperative Markov games.
Authors
(none)
Tags
Stats
Related papers
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- Cautiously Optimistic Policy Optimization And Exploration With Linear Function Approximation (2021)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Policy Regularization Via Noisy Advantage Values For Cooperative Multi-agent Actor-critic Methods (2021)0.00
- Policy Optimization With Model-based Explorations (2018)5.84