Foundations Of Safe Online Reinforcement Learning In The Linear Quadratic Regulator: \(\sqrt{t}\)-regret
2025 Β· Benjamin Schiffer, Lucas Janson
Abstract
Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement learning is difficult due to the complex interaction between safety, exploration, and exploitation. In this work, we seek to establish foundations for safety-constrained reinforcement learning by studying the canonical problem of controlling a one-dimensional linear dynamical system with unknown dynamics. We study the safety-constrained version of this problem, where the state must with high probability stay within a safe region, and we provide the first safe algorithm that achieves regret of \(\tilde\{O\}_T(\sqrt\{T\})\). Furthermore, the regret is with respect to the baseline of truncated linear controllers, a natural baseline of non-linear controllers that are well-suited for safety-constrained linear systems. In addition to introducing this new baseline
Authors
(none)
Tags
Stats
Related papers
- Rate-matching The Regret Lower-bound In The Linear Quadratic Regulator With Unknown Dynamics (2022)0.00
- Online Policy Gradient For Model Free Learning Of Linear Quadratic Regulators With \(\sqrt{t}\) Regret (2021)0.00
- Regret Bounds For Episodic Risk-sensitive Linear Quadratic Regulator (2024)0.00
- Implications Of Regret On Stability Of Linear Dynamical Systems (2022)6.34
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Logarithmic Regret For Nonlinear Control (2025)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19