Deterministic Policy Gradients With General State Transitions
2018 Β· Qingpeng Cai, Ling Pan, Pingzhong Tang
Abstract
We study a reinforcement learning setting, where the state transition function is a convex combination of a stochastic continuous function and a deterministic function. Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG). We firstly give a simple example to illustrate that the deterministic policy gradient may be infinite under deterministic state transitions, and introduce a theoretical technique to prove the existence of the policy gradient in this generalized setting. Using this technique, we prove that the deterministic policy gradient indeed exists for a certain set of discount factors, and further prove two conditions that guarantee the existence for all discount factors. We then derive a closed form of the policy gradient whenever exists. Furthermore, to overcome the challenge of high sample complexity of DPG in this setting, we propose the Generalized Deterministic Policy Gradient (GDPG)
Authors
(none)
Tags
Stats
Related papers
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Deterministic Value-policy Gradients (2019)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Zeroth-order Deterministic Policy Gradient (2020)0.00
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00
- ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm For Sparse Reward Continuous Control (2024)0.00
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00