Steering Your Diffusion Policy With Latent Space Reinforcement Learning
2025 Β· Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, et al.
Abstract
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is hig
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00