Interpolated Policy Gradient: Merging On-policy And Off-policy Gradient Estimation For Deep Reinforcement Learning
2017 Β· Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, et al.
Abstract
Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorit
Authors
(none)
Tags
Stats
Related papers
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00
- Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model (2021)0.00
- Divergence-augmented Policy Optimization (2025)0.00
- Batch Reinforcement Learning With A Nonparametric Off-policy Policy Gradient (2020)0.00
- Off-policy Policy Gradient With State Distribution Correction (2019)0.00
- Combining Policy Gradient And Q-learning (2016)0.00
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00