Abstract

Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of \(\tilde\{O\}(\sqrt\{d_\{\mathcal\{R\}\} + d_\{\mathcal\{F\}\}\}N^\{-1/2\})\) using \(N\) measurements, where \(d_\{\mathcal\{R\}\}\) and \(d_\{\mathcal\{F\}\}\) denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing the complexity of general function approximation in reinforcement learnin

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyzhao2025sample

Related papers