Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps
2025 Β· Ningyuan Yang, Jiaxuan Gao, Feng Gao, et al.
Abstract
Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies demonstrates much lower sample efficiency compared to directly applying PPO on MLP policies (MLP+PPO).
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Using Human Feedback To Fine-tune Diffusion Models Without Any Reward Model (2023)17.39
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00
- Steering Your Diffusion Policy With Latent Space Reinforcement Learning (2025)0.00