A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning
2019 Β· Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, et al.
Abstract
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm.
Authors
(none)
Tags
Stats
Related papers
- Distributed Off-policy Actor-critic Reinforcement Learning With Policy Consensus (2019)11.67
- Actor-attention-critic For Multi-agent Reinforcement Learning (2018)0.00
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Multi-agent Natural Actor-critic Reinforcement Learning Algorithms (2021)3.58
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Fully Decentralized Multi-agent Reinforcement Learning With Networked Agents (2018)0.00
- Distributed Value Function Approximation For Collaborative Multi-agent Reinforcement Learning (2020)8.60
- Scalable Centralized Deep Multi-agent Reinforcement Learning Via Policy Gradients (2018)0.00