Backstepping Temporal Difference Learning
2023 Β· Han-Dong Lim, Donghwan Lee
Abstract
Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD-learning algorithms, including gradient-TD learning (GTD), and TD-learning with correction (TDC), have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective, and propose a new convergent algorithm. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. Finally, convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Approximate Temporal Difference Learning Is A Gradient Descent For Reversible Policies (2018)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Revisiting A Design Choice In Gradient Temporal Difference Learning (2023)0.00
- On A Convergent Off -policy Temporal Difference Learning Algorithm In On-line Learning Environment (2016)0.00
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning With Polynomial Sample Complexity (2020)5.84