Fixed-horizon Temporal Difference Methods For Stable Reinforcement Learning
2019 Β· Kristopher de Asis, Alan Chan, Silviu Pitis, et al.
Abstract
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a \(\textit\{fixed\}\) number of future time steps. To learn the value function for horizon \(h\), these algorithms bootstrap from the value function for horizon \(h-1\), or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as "the deadly triad"). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and \(n\)-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value func
Authors
(none)
Tags
Stats
Related papers
- Discerning Temporal Difference Learning (2023)0.00
- Prediction And Control In Continual Reinforcement Learning (2023)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Differential Temporal Difference Learning (2018)5.24
- Approximate Temporal Difference Learning Is A Gradient Descent For Reversible Policies (2018)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00