Dichotomous Diffusion Policy Optimization
2025 Β· Ruiming Liang, Yinan Zheng, Kexin Zheng, et al.
Abstract
Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion policies using reinforcement learning (RL) remains challenging. Existing methods either suffer from unstable training due to directly maximizing value objectives, or face computational issues due to relying on crude Gaussian likelihood approximation, which requires a large amount of sufficiently small denoising steps. In this work, we propose DIPOLE (Dichotomous diffusion Policy improvement), a novel RL algorithm designed for stable and controllable diffusion policy optimization. We begin by revisiting the KL-regularized objective in RL, which offers a desirable weighted regression objective for diffusion policy extraction, but often struggles to balance greediness and stability. We then formulate a greedified policy regularization scheme, which naturally
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04