Reanalysis Of Variance Reduced Temporal Difference Learning
2020 Β· Tengyu Xu, Zhe Wang, Yi Zhou, et al.
Abstract
Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was proposed by Korda and La (2015), which applies the variance reduction technique directly to the online TD learning with Markovian samples. In this work, we first point out the technical errors in the analysis of VRTD in Korda and La (2015), and then provide a mathematically solid analysis of the non-asymptotic convergence of VRTD and its variance reduction performance. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate. Furthermore, the variance error (for both i.i.d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD. As a result, the overall computational
Authors
(none)
Tags
Stats
Related papers
- Variance-reduced Off-policy TDC Learning: Non-asymptotic Convergence Analysis (2020)3.58
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Per-decision Multi-step Temporal Difference Learning With Control Variates (2018)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Finite-time Analysis Of Temporal Difference Learning With Experience Replay (2023)0.00