Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation
2019 Β· Jun Sun, Gang Wang, Georgios B. Giannakis, et al.
Abstract
Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an `additional' projection step to control the `gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d.
Authors
(none)
Tags
Stats
Related papers
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Exact Formulas For Finite-time Estimation Errors Of Decentralized Temporal Difference Learning With Linear Function Approximation (2022)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Fast Multi-agent Temporal-difference Learning Via Homotopy Stochastic Primal-dual Optimization (2019)0.00
- Accelerated Distributional Temporal Difference Learning With Linear Function Approximation (2025)0.00
- Distributed Value Function Approximation For Collaborative Multi-agent Reinforcement Learning (2020)8.60