TD Convergence: An Optimization Perspective
2023 Β· Kavosh Asadi, Shoham Sabach, Yao Liu, et al.
Abstract
We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.
Authors
(none)
Tags
Stats
Related papers
- Geometric Insights Into The Convergence Of Nonlinear TD Learning (2019)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Neural Temporal-difference And Q-learning Provably Converge To Global Optima (2019)7.81
- Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning (2020)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00
- Gradient Temporal-difference Learning With Regularized Corrections (2020)0.00
- Backstepping Temporal Difference Learning (2023)0.00