Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems
2024 Β· Yilie Huang, Yanwei Jia, Xun Yu Zhou
Abstract
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of \(O(N^\{\frac\{3\}\{4\}\})\) up to a logarithmic factor, where \(N\) is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparis
Authors
(none)
Tags
Stats
Related papers
- Logarithmic Regret For Episodic Continuous-time Linear-quadratic Reinforcement Learning Over A Finite-time Horizon (2020)7.81
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning (2022)4.52
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00
- Learning The Linear Quadratic Regulator From Nonlinear Observations (2020)0.00
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00