B2MAPO: A Batch-by-batch Multi-agent Policy Optimization To Balance Performance And Efficiency
2024 Β· Wenjing Zhang, Wei Zhang, Wenqing Hu, et al.
Abstract
Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential execution, resulting in longer model training and execution time. Intuitively, partitioning policies of all agents according to their interdependence and updating joint policy batch-by-batch can effectively balance performance and efficiency. However, how to determine the optimal batch partition of policies and batch updating order are challenging problems. Firstly, a sequential batched policy updating scheme, B2MAPO (Batch by Batch Multi-Agent Policy Optimization), is proposed with a theoretical guarantee of the monotonic incrementally tightened bound. Secondly, a universal modulized plug-and-pla
Authors
(none)
Tags
Stats
Related papers
- Order Matters: Agent-by-agent Policy Optimization (2023)0.00
- Multi-path Policy Optimization (2019)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Multi-agent Guided Policy Optimization (2025)0.00
- Multi-agent Constrained Policy Optimisation (2021)0.00
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00
- Offline Multi-agent Reinforcement Learning Via In-sample Sequential Policy Optimization (2024)0.00
- Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts (2021)0.00