Exact Formulas For Finite-time Estimation Errors Of Decentralized Temporal Difference Learning With Linear Function Approximation
2022 Β· Xingang Guo, Bin Hu
Abstract
In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Characterizing The Exact Behaviors Of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory (2019)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Uncertainty Quantification For Markov Chain Induced Martingales With Application To Temporal Difference Learning (2025)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- High-probability Sample Complexities For Policy Evaluation With Linear Function Approximation (2023)0.00