Modular Diffusion Policy Training: Decoupling And Recombining Guidance And Diffusion For Offline RL
2025 Β· Zhaoyang Chen, Cody Fleming
Abstract
Classifier free guidance has shown strong potential in diffusion-based reinforcement learning. However, existing methods rely on joint training of the guidance module and the diffusion model, which can be suboptimal during the early stages when the guidance is inaccurate and provides noisy learning signals. In offline RL, guidance depends solely on offline data: observations, actions, and rewards, and is independent of the policy module's behavior, suggesting that joint training is not required. This paper proposes modular training methods that decouple the guidance module from the diffusion model, based on three key findings: Guidance Necessity: We explore how the effectiveness of guidance varies with the training stage and algorithm choice, uncovering the roles of guidance and diffusion. A lack of good guidance in the early stage presents an opportunity for optimization. Guidance-First Diffusion Training: We introduce a method where the guidance module is first trained independen
Authors
(none)
Tags
Stats
Related papers
- Decoupled Guidance Diffusion For Adaptive Offline Safe Reinforcement Learning (2026)0.00
- Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Madiff: Offline Multi-agent Learning With Diffusion Models (2023)2.26
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00