Oracle Complexity Reduction For Model-free LQR: A Stochastic Variance-reduced Policy Gradient Approach
2023 Β· Leonardo F. Toso, Han Wang, James Anderson
Abstract
We investigate the problem of learning an \(\epsilon\)-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only \(O\left(log\left(1/\epsilon\right)^\{\beta\}\right)\) two-point cost information for \(\beta \in (0,1)\).
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- Policy Gradient For LQR With Domain Randomization (2025)2.26