Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity
2021 Β· Ziyi Chen, Yi Zhou, Rongrong Chen
Abstract
The finite-time convergence of off-policy TD learning has been comprehensively studied recently. However, such a type of convergence has not been well established for off-policy TD learning in the multi-agent setting, which covers broader applications and is fundamentally more challenging. This work develops two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning under Markovian sampling. In particular, our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency. Under Markovian sampling and linear function approximation, we proved that the finite-time sample complexity of both algorithms for achieving an \(\epsilon\)-accurate solution is in the order of \(\mathcal\{O\}(\epsilon^\{-1\}\ln \epsilon^\{-1\})\), matching the near-optimal sample complexity of centralized TD(0) and TDC. Importantly, the communication complexity of our alg
Authors
(none)
Tags
Stats
Related papers
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Variance-reduced Off-policy TDC Learning: Non-asymptotic Convergence Analysis (2020)3.58
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Analysis Of Off-policy Multi-step Td-learning With Linear Function Approximation (2024)2.26
- Exact Formulas For Finite-time Estimation Errors Of Decentralized Temporal Difference Learning With Linear Function Approximation (2022)0.00
- Sample And Communication Efficient Fully Decentralized MARL Policy Evaluation Via A New Approach: Local TD Update (2024)0.00
- Distributed TD(0) With Almost No Communication (2021)6.77