Policy Regularization Via Noisy Advantage Values For Cooperative Multi-agent Actor-critic Methods
2021 Β· Jian Hu, Siyue Hu, Shih-Wei Liao
Abstract
Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value function. However, previous literature shows that MAPPO may not perform as well as Independent PPO (IPPO) and the Fine-tuned QMIX on Starcraft Multi-Agent Challenge (SMAC). MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features, which may be not friendly to algorithmic utility. By contrast, we find that MAPPO may face the problem of \textit\{The Policies Overfitting in Multi-agent Cooperation(POMAC)\}, as they learn policies by the sampled advantage values. Then POMAC may lead to updating the multi-agent policies in a suboptimal direction and prevent the agents from exploring better trajectories. In this paper, to mitigate the multi-agent policies overfitting, we propose a novel policy regularization method, which disturbs the a
Authors
(none)
Tags
Stats
Related papers
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- AM-PPO: (advantage) Alpha-modulation With Proximal Policy Optimization (2025)0.00
- Multi-path Policy Optimization (2019)0.00
- Robust And Diverse Multi-agent Learning Via Rational Policy Gradient (2025)0.00
- ANO: A Principled Approach To Robust Policy Optimization (2026)0.00