FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility
2023 Β· Lang Feng, Dong Xing, Junru Zhang, et al.
Abstract
Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperfor
Authors
(none)
Tags
Stats
Related papers
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- Jointppo: Diving Deeper Into The Effectiveness Of PPO In Multi-agent Reinforcement Learning (2024)0.00
- Multi-path Policy Optimization (2019)0.00
- Policy Regularization Via Noisy Advantage Values For Cooperative Multi-agent Actor-critic Methods (2021)0.00
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Permutation Invariant Policy Optimization For Mean-field Multi-agent Reinforcement Learning: A Principled Approach (2021)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61