Abstract

Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement learning is difficult due to the complex interaction between safety, exploration, and exploitation. In this work, we seek to establish foundations for safety-constrained reinforcement learning by studying the canonical problem of controlling a one-dimensional linear dynamical system with unknown dynamics. We study the safety-constrained version of this problem, where the state must with high probability stay within a safe region, and we provide the first safe algorithm that achieves regret of \(\tilde\{O\}_T(\sqrt\{T\})\). Furthermore, the regret is with respect to the baseline of truncated linear controllers, a natural baseline of non-linear controllers that are well-suited for safety-constrained linear systems. In addition to introducing this new baseline

Authors

(none)

Tags

  • Safe RL
  • Exploration

Stats

Related papers