Decoupled Guidance Diffusion For Adaptive Offline Safe Reinforcement Learning
2026 Β· Rufeng Chen, Zhaofan Zhang, Zhejiang Yang, et al.
Abstract
arXiv:2605.02777v2 Announce Type: replace-cross Abstract: Offline safe reinforcement learning often requires policies to adapt at deployment time to safety budgets that vary across episodes or change within a single episode. While diffusion-based planners enable flexible trajectory generation, existing guidance schemes often treat reward improvement and constraint satisfaction as competing gradient objectives, which can lead to unreliable safety compliance under cost limits. We reinterpret adaptive safe trajectory generation as sampling from a constrained trajectory distribution, where the budget restricts the trajectory region, and reward shapes preferences within that region. This perspective motivates Safe Decoupled Guidance Diffusion (SDGD), which conditions classifier-free guidance on the cost limit to bias sampling toward trajectories satisfying the specified limit, while using reward-gradient guidance to refine trajectories for higher return. Because direct reward guidance can
Authors
(none)
Tags
Stats
Related papers
- Modular Diffusion Policy Training: Decoupling And Recombining Guidance And Diffusion For Offline RL (2025)0.00
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00
- Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies (2025)0.00
- Diffusion Models For Offline Multi-agent Reinforcement Learning With Safety Constraints (2024)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Advantage-guided Diffusion For Model-based Reinforcement Learning (2026)0.00
- GTA: Generative Trajectory Augmentation With Guidance For Offline Reinforcement Learning (2024)6.62
- Bitrajdiff: Bidirectional Trajectory Generation With Diffusion Models For Offline Reinforcement Learning (2025)0.00