Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens
2024 Β· Amirreza Neshaei Moghaddam, Alex Olshevsky, Bahman Gharesifard
Abstract
We provide the first known algorithm that provably achieves \(\epsilon\)-optimality within \(\widetilde\{\mathcal\{O\}\}(1/\epsilon)\) function evaluations for the discounted discrete-time LQR problem with unknown parameters, without relying on two-point gradient estimates. These estimates are known to be unrealistic in many settings, as they depend on using the exact same initialization, which is to be selected randomly, for two different policies. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates, which either leads to \(\widetilde\{\mathcal\{O\}\}(1/\epsilon^2)\) rates or heavily relies on stability assumptions.
Authors
(none)
Tags
Stats
Related papers
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- On The Optimization Landscape Of Dynamic Output Feedback: A Case Study For Linear Quadratic Regulator (2022)4.52
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Oracle Complexity Reduction For Model-free LQR: A Stochastic Variance-reduced Policy Gradient Approach (2023)2.26
- The Gap Between Model-based And Model-free Methods On The Linear Quadratic Regulator: An Asymptotic Viewpoint (2018)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19