PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method
2021 Β· Ziwei Guan, Tengyu Xu, Yingbin Liang
Abstract
Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations. In this work, we propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity of ETD to be polynomials. Our exper
Authors
(none)
Tags
Stats
Related papers
- A First Empirical Study Of Emphatic Temporal Difference Learning (2017)0.00
- Should All Temporal Difference Learning Use Emphasis? (2019)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Preferential Temporal Difference Learning (2021)0.00
- O\(^2\)TD: (near)-optimal Off-policy TD Learning (2017)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Truncated Emphatic Temporal Difference Methods For Prediction And Control (2021)0.00