The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games
2021 Β· Chao Yu, Akash Velu, Eugene Vinitsky, et al.
Abstract
Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, Google Research Football, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter
Authors
(none)
Tags
Stats
Related papers
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Jointppo: Diving Deeper Into The Effectiveness Of PPO In Multi-agent Reinforcement Learning (2024)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- Policy Optimization With Model-based Explorations (2018)5.84
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00