Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning
2020 Β· Shuang Qiu, Zhuoran Yang, Xiaohan Wei, et al.
Abstract
Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy evaluation has achieved great success in modern reinforcement learning. It is shown that such a problem can be reformulated as a stochastic nonconvex-strongly-concave optimization problem, which is challenging as naive stochastic gradient descent-ascent algorithm suffers from slow convergence. Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data. However, in practice, a single-timescale single-loop stochastic algorithm is preferred due to its simplicity and also because its step-size is easier to tune. In this paper, we propose two single-timescale single-loop algorithms which require only one data point each step. Our first algorithm implements momentum updates on both primal and dual variables achieving an \(O(\epsilon^\{-4\})\) sample complexity, which shows the important role of momentum
Authors
(none)
Tags
Stats
Related papers
- Differentially Private Temporal Difference Learning With Stochastic Nonconvex-strongly-concave Optimization (2022)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Geometric Insights Into The Convergence Of Nonlinear TD Learning (2019)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00
- Non-asymptotic Analysis For Two Time-scale TDC With General Smooth Function Approximation (2021)0.00
- TD Convergence: An Optimization Perspective (2023)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00