Distributed Policy Gradient With Variance Reduction In Multi-agent Reinforcement Learning
2021 Β· Xiaoxiao Zhao, Jinlong Lei, Li Li, et al.
Abstract
This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the \{distribution shift\} problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the sample and communication complexity to obtain an \(\epsilon\)-approximate stationary point. Numerical
Authors
(none)
Tags
Stats
Related papers
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Descent-guided Policy Gradient For Scalable Cooperative Multi-agent Learning (2026)0.00
- Communication-efficient Policy Gradient Methods For Distributed Reinforcement Learning (2018)13.05
- Settling The Variance Of Multi-agent Policy Gradients (2021)0.00
- Cooperative Multi-agent Reinforcement Learning With Partial Observations (2020)10.35
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00