How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models?
2026 Β· Xiaoyuan Cheng, Wenxuan Yuan, Boyang Li, et al.
Abstract
Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Decoupled Guidance Diffusion For Adaptive Offline Safe Reinforcement Learning (2026)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00