Continuous-time Reinforcement Learning: Ellipticity Enables Model-free Value Function Approximation
2026 Β· Wenlong Mou
Abstract
We study off-policy reinforcement learning for controlling continuous-time Markov diffusion processes with discrete-time observations and actions. We consider model-free algorithms with function approximation that learn value and advantage functions directly from data, without unrealistic structural assumptions on the dynamics. Leveraging the ellipticity of the diffusions, we establish a new class of Hilbert-space positive definiteness and boundedness properties for the Bellman operators. Based on these properties, we propose the Sobolev-prox fitted \(q\)-learning algorithm, which learns value and advantage functions by iteratively solving least-squares regression problems. We derive oracle inequalities for the estimation error, governed by (i) the best approximation error of the function classes, (ii) their localized complexity, (iii) exponentially decaying optimization error, and (iv) numerical discretization error. These results identify ellipticity as a key structural property th
Authors
(none)
Tags
Stats
Related papers
- Continuous-time Value Function Approximation In Reproducing Kernel Hilbert Spaces (2018)0.00
- Value-distributional Model-based Reinforcement Learning (2023)1.56
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- SBEED: Convergent Reinforcement Learning With Nonlinear Function Approximation (2017)0.00
- Adaptive Approximate Policy Iteration (2020)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Q-learning In Continuous Time (2022)0.00
- Blending MPC & Value Function Approximation For Efficient Reinforcement Learning (2020)0.00