The Gap Between Model-based And Model-free Methods On The Linear Quadratic Regulator: An Asymptotic Viewpoint
2018 Β· Stephen Tu, Benjamin Recht
Abstract
The effectiveness of model-based versus model-free methods is a long-standing question in reinforcement learning (RL). Motivated by recent empirical success of RL on continuous control tasks, we study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). We show that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. For policy evaluation, we study a simple family of problem instances and show that nominal (certainty equivalence principle) control also requires several factors of state and input dimension fewer samples than the policy gradient method to reach the same level of control performance on these instances. Furthermore, the gap persists even when employing commonly used baseli
Authors
(none)
Tags
Stats
Related papers
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Meta-learning Linear Quadratic Regulators: A Policy Gradient MAML Approach For Model-free LQR (2024)0.00
- On The Optimization Landscape Of Dynamic Output Feedback: A Case Study For Linear Quadratic Regulator (2022)4.52
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19