Distributed TD(0) With Almost No Communication
2021 Β· Rui Liu, Alex Olshevsky
Abstract
We provide a new non-asymptotic analysis of distributed TD(0) with linear function approximation. Our approach relies on "one-shot averaging," where \(N\) agents run local copies of TD(0) and average the outcomes only once at the very end. We consider two models: one in which the agents interact with an environment they can observe and whose transitions depends on all of their actions (which we call the global state model), and one in which each agent can run a local copy of an identical Markov Decision Process, which we call the local state model. In the global state model, we show that the convergence rate of our distributed one-shot averaging method matches the known convergence rate of TD(0). By contrast, the best convergence rate in the previous literature showed a rate which, according to the worst-case bounds given, could underperform the non-distributed version by \(O(N^3)\) in terms of the number of agents \(N\). In the local state model, we demonstrate a version of the line
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Convergence Of TD(0) Under Polynomial Mixing With Nonlinear Function Approximation (2025)0.00
- A Finite Sample Analysis Of Distributional TD Learning With Linear Function Approximation (2025)0.00
- Personalized Multi-agent Average Reward Td-learning Via Joint Linear Approximation (2026)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- TD Convergence: An Optimization Perspective (2023)0.00