Diffusion Policy Through Conditional Proximal Policy Optimization
2026 Β· Ben Liu, Shunpeng Yang, Hua Chen
Abstract
Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more diverse and flexible action generation compared to the conventional Gaussian policy. Despite various attempts to combine RL with diffusion, a key challenge is the difficulty of computing action log-likelihood under the diffusion model. This greatly hinders the direct application of diffusion policies in on-policy reinforcement learning. Most existing methods calculate or approximate the log-likelihood through the entire denoising process in the diffusion model, which can be memory- and computationally inefficient. To overcome this challenge, we propose a novel and efficient method to train a diffusion policy in an on-policy setting that requires only evaluating a simple Gaussian probability. This is achieved by aligning the policy iteration with the dif
Authors
(none)
Tags
Stats
Related papers
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00