Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning
2019 Β· Chao Qu, Shie Mannor, Huan Xu, et al.
Abstract
We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve the joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume that the reward function for each agent can be different and observed only locally by the agent itself. Furthermore, each agent is located at a node of a communication network and can exchanges information only with its neighbors. Using softmax temporal consistency and a decentralized optimization method, we obtain a principled and data-efficient iterative algorithm. In the first step of each iteration, an agent computes its local policy and value gradients and then updates only policy parameters. In the second step, the agent propagates to its neighbors the messages based on its value function and then updates its own value function. Hence we name the algorithm value propagation. We prove a no
Authors
(none)
Tags
Stats
Related papers
- Fully Decentralized Multi-agent Reinforcement Learning With Networked Agents (2018)0.00
- Mean-field Multi-agent Reinforcement Learning: A Decentralized Network Approach (2021)0.00
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Towards Global Optimality In Cooperative MARL With The Transformation And Distillation Framework (2022)0.00
- Q-value Path Decomposition For Deep Multiagent Reinforcement Learning (2020)0.00
- Decentralized Multi-agent Reinforcement Learning With Networked Agents: Recent Advances (2019)0.00