Jointppo: Diving Deeper Into The Effectiveness Of PPO In Multi-agent Reinforcement Learning
2024 Β· Chenxing Liu, Guizhong Liu
Abstract
While Centralized Training with Decentralized Execution (CTDE) has become the prevailing paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be suitable for scenarios in which agents can fully communicate and share observations with each other. Fully centralized methods, also know as Centralized Training with Centralized Execution (CTCE) methods, can fully utilize observations of all the agents by treating the entire system as a single agent. However, traditional CTCE methods suffer from scalability issues due to the exponential growth of the joint action space. To address these challenges, in this paper we propose JointPPO, a CTCE method that uses Proximal Policy Optimization (PPO) to directly optimize the joint policy of the multi-agent system. JointPPO decomposes the joint policy into conditional probabilities, transforming the decision-making process into a sequence generation task. A Transformer-based joint policy network is constructed, trained with a PPO loss tailo
Authors
(none)
Tags
Stats
Related papers
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- Multi-agent Guided Policy Optimization (2025)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Multi-path Policy Optimization (2019)0.00
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61