Diwa: Diffusion Policy Adaptation With World Models
2025 Β· Akshay L Chandra, Iman Nematollahi, Chenguang Huang, et al.
Abstract
Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Although prior work frames the denoising process in diffusion policies as a Markov Decision Process to enable RL-based updates, its strong dependence on environment interaction remains highly inefficient. To bridge this gap, we introduce DiWA, a novel framework that leverages a world model for fine-tuning diffusion-based robotic skills entirely offline with reinforcement learning. Unlike model-free approaches that require millions of environment interactions to fine-tune a repertoire of robot skills, DiWA achieves effective adaptation using a world model trained once on a few hundred thousand offline play interactions. This results in dramatically improved sample efficiency
Authors
(none)
Tags
Stats
Related papers
- Steering Your Diffusion Policy With Latent Space Reinforcement Learning (2025)0.00
- Imagine-2-drive: Leveraging High-fidelity World Models Via Multi-modal Diffusion Policies (2024)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Avoiding Mode Collapse In Diffusion Models Fine-tuned With Reinforcement Learning (2024)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- DIAR: Diffusion-model-guided Implicit Q-learning With Adaptive Revaluation (2024)0.00