Semi-on-policy Training For Sample Efficient Multi-agent Policy Gradients
2021 Β· Bozhidar Vasilev, Tarun Gupta, Bei Peng, et al.
Abstract
Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. We enhance two state-of-the-art policy gradient algorithms with SOP training, demonstrating significant performance improvements. Furthermore, we show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
Authors
(none)
Tags
Stats
Related papers
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Off-oab: Off-policy Policy Gradient Method With Optimal Action-dependent Baseline (2024)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Smacv2: An Improved Benchmark For Cooperative Multi-agent Reinforcement Learning (2022)5.24