Should All Temporal Difference Learning Use Emphasis?
2019 Β· Xiang Gu, Sina Ghiassian, Richard S. Sutton
Abstract
Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class of problems where ETD converges but TD diverges. In this paper, we empirically show that ETD converges on a few other well-known on-policy experiments whereas TD either diverges or performs poorly. We also show that ETD outperforms TD on the mountain car prediction problem. Our results, together with a similar pattern observed under off-policy training in prior works, suggest that ETD might be a good substitute over conventional TD.
Authors
(none)
Tags
Stats
Related papers
- A First Empirical Study Of Emphatic Temporal Difference Learning (2017)0.00
- PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method (2021)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Backstepping Temporal Difference Learning (2023)0.00
- Truncated Emphatic Temporal Difference Methods For Prediction And Control (2021)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- On The Performance Of Temporal Difference Learning With Neural Networks (2023)0.00