Permutation Invariant Policy Optimization For Mean-field Multi-agent Reinforcement Learning: A Principled Approach
2021 Β· Yan Li, Lingxiao Wang, Jiachen Yang, et al.
Abstract
Multi-agent reinforcement learning (MARL) becomes more challenging in the presence of more agents, as the capacity of the joint state and action spaces grows exponentially in the number of agents. To address such a challenge of scale, we identify a class of cooperative MARL problems with permutation invariance, and formulate it as a mean-field Markov decision processes (MDP). To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture. We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence. Moreover, its sample complexity is independent of the number of agents. We validate the theoretical advantages of MF-PPO with numerical experiments in the multi-agent particle environment (MPE). In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outp
Authors
(none)
Tags
Stats
Related papers
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- PIC: Permutation Invariant Critic For Multi-agent Deep Reinforcement Learning (2019)0.00
- Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts (2021)0.00
- Spectra: Scalable Multi-agent Reinforcement Learning With Permutation-free Networks (2025)0.00
- Offline Multi-agent Reinforcement Learning Via In-sample Sequential Policy Optimization (2024)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Major-minor Mean Field Multi-agent Reinforcement Learning (2023)0.00