An Efficient Off-policy Reinforcement Learning Algorithm For The Continuous-time LQR Problem
2023 · Victor G. Lopez, Matthias A. Müller
Abstract
In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.
Authors
(none)
Tags
Stats
Related papers
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Least-squares Temporal Difference Learning For The Linear Quadratic Regulator (2017)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19
- Finite-time Analysis Of Approximate Policy Iteration For The Linear Quadratic Regulator (2019)0.00
- A Tour Of Reinforcement Learning: The View From Continuous Control (2018)19.86
- Learning The Linear Quadratic Regulator From Nonlinear Observations (2020)0.00
- Sample Complexity Of The Linear Quadratic Regulator: A Reinforcement Learning Lens (2024)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60