Sample And Computationally Efficient Continuous-time Reinforcement Learning With General Function Approximation

Abstract

Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of \(\tilde\{O\}(\sqrt\{d_\{\mathcal\{R\}\} + d_\{\mathcal\{F\}\}\}N^\{-1/2\})\) using \(N\) measurements, where \(d_\{\mathcal\{R\}\}\) and \(d_\{\mathcal\{F\}\}\) denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learnin

Sample And Computationally Efficient Continuous-time Reinforcement Learning With General Function Approximation

Abstract

Authors

Tags

Stats

Related papers