Distributed Value Function Approximation For Collaborative Multi-agent Reinforcement Learning
2020 Β· Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic
Abstract
In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information structure constraints, limiting inter-agent communications to small neighborhoods. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy gradient temporal difference learning algorithms, including eligibility traces with state dependent parameters, and 2) linear stochastic time varying consensus schemes, represented by directed graphs. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. The main contribution of the paper is a convergence analysis based on the general properties of the underlying Feller-Markov processes and the stochastic time varying consensus model. We prove, under general assumptions, that the parame
Authors
(none)
Tags
Stats
Related papers
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Cooperative Multi-agent Reinforcement Learning: Asynchronous Communication And Linear Function Approximation (2023)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Multi-agent Fully Decentralized Value Function Learning With Linear Convergence Rates (2018)10.21
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)11.39
- Provably Efficient Cooperative Multi-agent Reinforcement Learning With Function Approximation (2021)0.00
- Fast Multi-agent Temporal-difference Learning Via Homotopy Stochastic Primal-dual Optimization (2019)0.00
- Multi-agent Policy Optimization With Approximatively Synchronous Advantage Estimation (2020)0.00