Finite-time Analysis Of Temporal Difference Learning With Experience Replay
2023 Β· Han-Dong Lim, Donghwan Lee
Abstract
Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.
Authors
(none)
Tags
Stats
Related papers
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- An Improved Finite-time Analysis Of Temporal Difference Learning With Deep Neural Networks (2024)0.00
- TD Or Not TD: Analyzing The Role Of Temporal Differencing In Deep Reinforcement Learning (2018)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Reanalysis Of Variance Reduced Temporal Difference Learning (2020)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00