Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator
2019 Β· Karl Krauth, Stephen Tu, Benjamin Recht
Abstract
We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within \(\epsilon\) of the optimal LQR controller, each step of policy evaluation requires at most \((n+d)^3/\epsilon^2\) samples, where \(n\) is the dimension of the state vector and \(d\) is the dimension of the input vector. On the other hand, only \(log(1/\epsilon)\) policy improvement steps suffice, resulting in an overall sample complexity of \((n+d)^3 \epsilon^\{-2\} log(1/\epsilon)\). We furthermore build on our analysis and construct a simple adaptive procedure based on \(\epsilon\)-greedy
Authors
(none)
Tags
Stats
Related papers
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- The Gap Between Model-based And Model-free Methods On The Linear Quadratic Regulator: An Asymptotic Viewpoint (2018)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Learning The Linear Quadratic Regulator From Nonlinear Observations (2020)0.00