Least-squares Temporal Difference Learning For The Linear Quadratic Regulator
2017 Β· Stephen Tu, Benjamin Recht
Abstract
Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within \(\epsilon\)-relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that
Authors
(none)
Tags
Stats
Related papers
- The Gap Between Model-based And Model-free Methods On The Linear Quadratic Regulator: An Asymptotic Viewpoint (2018)0.00
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Learning The Linear Quadratic Regulator From Nonlinear Observations (2020)0.00
- An Efficient Off-policy Reinforcement Learning Algorithm For The Continuous-time LQR Problem (2023)6.34