Deterministic Value-policy Gradients
2019 Β· Qingpeng Cai, Ling Pan, Pingzhong Tang
Abstract
Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic
Authors
(none)
Tags
Stats
Related papers
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Deterministic Policy Gradients With General State Transitions (2018)0.00
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm For Sparse Reward Continuous Control (2024)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00