A Tale Of Two-timescale Reinforcement Learning With The Tightest Finite-time Bound
2019 Β· Gal Dalal, Balazs Szorenyi, Gugan Thoppe
Abstract
Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, \(\theta_n\) and \(w_n,\) which are updated using two distinct stepsize sequences, \(\alpha_n\) and \(\beta_n,\) respectively. Assuming \(\alpha_n = n^\{-\alpha\}\) and \(\beta_n = n^\{-\beta\}\) with \(1 > \alpha > \beta > 0,\) we show that, with high probability, the two iterates converge to their respective solutions \(\theta^*\) and \(w^*\) at rates given by \(\|\theta_n - \theta^*\| = \tilde\{O\}( n^\{-\alpha/2\})\) and \(\|w_n - w^*\| = \tilde\{O\}(n^\{-\beta/2\});\) here, \(\tilde\{O\}\) hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieve
Authors
(none)
Tags
Stats
Related papers
- Finite-time Performance Bounds And Adaptive Learning Rate Selection For Two Time-scale Reinforcement Learning (2019)0.00
- Sample Complexity Bounds For Two Timescale Value-based Reinforcement Learning Algorithms (2020)0.00
- Fast Two-time-scale Stochastic Gradient Method With Applications In Reinforcement Learning (2024)0.00
- Finite Sample Analysis Of Two-timescale Stochastic Approximation With Applications To Reinforcement Learning (2017)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Non-asymptotic Analysis For Two Time-scale TDC With General Smooth Function Approximation (2021)0.00
- Finite-sample Analysis Of Proximal Gradient TD Algorithms (2020)0.00