Variational Policy Propagation For Multi-agent Reinforcement Learning
2020 Β· Chao Qu, Hui Li, Chang Liu, et al.
Abstract
We propose a *collaborative* multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a *joint* policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such that the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable. We evaluate our algorithm on several large scale challenging tasks and demonstrate that it outperforms previous state-of-the-arts.
Authors
(none)
Tags
Stats
Related papers
- NVIF: Neighboring Variational Information Flow For Large-scale Cooperative Multi-agent Scenarios (2022)0.00
- Variational Automatic Curriculum Learning For Sparse-reward Cooperative Multi-agent Problems (2021)0.00
- V-learning -- A Simple, Efficient, Decentralized Algorithm For Multiagent RL (2021)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- MAVEN: Multi-agent Variational Exploration (2019)0.00
- Distributed Policy Gradient With Variance Reduction In Multi-agent Reinforcement Learning (2021)0.00
- A Variational Approach To Mutual Information-based Coordination For Multi-agent Reinforcement Learning (2023)0.00
- Scalable Centralized Deep Multi-agent Reinforcement Learning Via Policy Gradients (2018)0.00