Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State
2025 Β· Ziheng Cheng, Xin Guo, Yufei Zhang
Abstract
The theory of continuous-time reinforcement learning (RL) has progressed rapidly in recent years. While the ultimate objective of RL is typically to learn deterministic control policies, most existing continuous-time RL methods rely on stochastic policies. Such approaches often require sampling actions at very high frequencies, and involve computationally expensive expectations over continuous action spaces, resulting in high-variance gradient estimates and slow convergence. In this paper, we introduce and develop deterministic policy gradient (DPG) methods for continuous-time RL. We derive a continuous-time policy gradient formula expressed as the expected gradient of an advantage rate function and establish a martingale characterization for both the value function and the advantage rate. These theoretical results provide tractable estimators for deterministic policy gradients in continuous-time RL. Building on this foundation, we propose a model-free continuous-time Deep Determinis
Authors
(none)
Tags
Stats
Related papers
- Deterministic Value-policy Gradients (2019)0.00
- Deterministic Policy Gradients With General State Transitions (2018)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00
- Policy Optimization For Continuous Reinforcement Learning (2023)2.26
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00