Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret
2021 Β· Asaf Cassel, Tomer Koren
Abstract
We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Meta-learning Linear Quadratic Regulators: A Policy Gradient MAML Approach For Model-free LQR (2024)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60
- The Gap Between Model-based And Model-free Methods On The Linear Quadratic Regulator: An Asymptotic Viewpoint (2018)0.00
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Foundations Of Safe Online Reinforcement Learning In The Linear Quadratic Regulator: \(\sqrt{t}\)-regret (2025)0.00
- On The Optimization Landscape Of Dynamic Output Feedback: A Case Study For Linear Quadratic Regulator (2022)4.52