Logarithmic Regret For Episodic Continuous-time Linear-quadratic Reinforcement Learning Over A Finite-time Horizon
2020 Β· Matteo Basei, Xin Guo, Anran Hu, et al.
Abstract
We study finite-time horizon continuous-time linear-quadratic reinforcement learning problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of order \(O((\ln M)(\ln\ln M))\), with \(M\) being the number of learning episodes. The analysis consists of two parts: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm.
Authors
(none)
Tags
Stats
Related papers
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Regret Bounds For Episodic Risk-sensitive Linear Quadratic Regulator (2024)0.00
- Square-root Regret Bounds For Continuous-time Episodic Markov Decision Processes (2022)2.26
- Logarithmic Regret For Nonlinear Control (2025)0.00
- Logarithmic Regret Bounds For Continuous-time Average-reward Markov Decision Processes (2022)5.24
- The Best Of Both Worlds: Reinforcement Learning With Logarithmic Regret And Policy Switches (2022)0.00
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00