Accuracy Of Discretely Sampled Stochastic Policies In Continuous-time Reinforcement Learning
2025 Β· Yanwei Jia, Du Ouyang, Yufei Zhang
Abstract
Stochastic policies (also known as relaxed controls) are widely used in continuous-time reinforcement learning algorithms. However, executing a stochastic policy and evaluating its performance in a continuous-time environment remain open challenges. This work introduces and rigorously analyzes a policy execution framework that samples actions from a stochastic policy at discrete time points and implements them as piecewise constant controls. We prove that as the sampling mesh size tends to zero, the controlled state process converges weakly to the dynamics with coefficients aggregated according to the stochastic policy. We explicitly quantify the convergence rate based on the regularity of the coefficients and establish an optimal first-order convergence rate for sufficiently regular coefficients. Additionally, we prove a \(1/2\)-order weak convergence rate that holds uniformly over the sampling noise with high probability, and establish a \(1/2\)-order pathwise convergence for each re
Authors
(none)
Tags
Stats
Related papers
- On The Sample Complexity And Metastability Of Heavy-tailed Policy Search In Continuous Control (2021)0.00
- Autoregressive Policies For Continuous Control Deep Reinforcement Learning (2019)7.50
- Linear Convergence Of A Policy Gradient Method For Some Finite Horizon Continuous Time Control Problems (2022)0.00
- Discretizing Continuous Action Space With Unimodal Probability Distributions For On-policy Reinforcement Learning (2024)0.00
- A Random Measure Approach To Reinforcement Learning In Continuous Time (2024)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Stochastic Resetting Accelerates Policy Convergence In Reinforcement Learning (2026)0.00