An Improved Finite-time Analysis Of Temporal Difference Learning With Deep Neural Networks
2024 Β· Zhifa Ke, Zaiwen Wen, Junyu Zhang
Abstract
Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general \(L\)-layer neural network. New proof techniques are developed and an improved new \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1\})\) sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1\})\) complexity under the Markovian sampling, as opposed to the best known \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) complexity in the existing literature.
Authors
(none)
Tags
Stats
Related papers
- On The Performance Of Temporal Difference Learning With Neural Networks (2023)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Sample Complexity And Overparameterization Bounds For Temporal Difference Learning With Neural Network Approximation (2021)0.00
- Finite-time Analysis Of Temporal Difference Learning With Experience Replay (2023)0.00
- Simplifying Deep Temporal Difference Learning (2024)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00