Rate-matching The Regret Lower-bound In The Linear Quadratic Regulator With Unknown Dynamics
2022 Β· Feicheng Wang, Lucas Janson
Abstract
The theory of reinforcement learning currently suffers from a mismatch between its empirical performance and the theoretical characterization of its performance, with consequences for, e.g., the understanding of sample efficiency, safety, and robustness. The linear quadratic regulator with unknown dynamics is a fundamental reinforcement learning setting with significant structure in its dynamics and cost function, yet even in this setting there is a gap between the best known regret lower-bound of \(Ξ©_p(\sqrt\{T\})\) and the best known upper-bound of \(O_p(\sqrt\{T\}\,\text\{polylog\}(T))\). The contribution of this paper is to close that gap by establishing a novel regret upper-bound of \(O_p(\sqrt\{T\})\). Our proof is constructive in that it analyzes the regret of a concrete algorithm, and simultaneously establishes an estimation error bound on the dynamics of \(O_p(T^\{-1/4\})\) which is also the first to match the rate of a known lower-bound. The two keys to our improved proof tec
Authors
(none)
Tags
Stats
Related papers
- Foundations Of Safe Online Reinforcement Learning In The Linear Quadratic Regulator: \(\sqrt{t}\)-regret (2025)0.00
- Regret Bounds For Episodic Risk-sensitive Linear Quadratic Regulator (2024)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Refined Regret For Adversarial Mdps With Linear Function Approximation (2023)0.00
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00