Divergence-augmented Policy Optimization
2025 Β· Qing Wang, Yingru Li, Jiechao Xiong, et al.
Abstract
In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.
Authors
(none)
Tags
Stats
Related papers
- Interpolated Policy Gradient: Merging On-policy And Off-policy Gradient Estimation For Deep Reinforcement Learning (2017)0.00
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Policy Augmentation: An Exploration Strategy For Faster Convergence Of Deep Reinforcement Learning Algorithms (2021)2.26
- Bregman Gradient Policy Optimization (2021)0.00
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Dropout Strategy In Reinforcement Learning: Limiting The Surrogate Objective Variance In Policy Optimization Methods (2023)0.00
- Bootstrap Advantage Estimation For Policy Optimization In Reinforcement Learning (2022)0.00
- Proximal Policy Optimization Algorithms (2017)0.00