Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies
2025 Β· Ximan Sun, Xiang Cheng
Abstract
Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatis
Authors
(none)
Tags
Stats
Related papers
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Decoupled Guidance Diffusion For Adaptive Offline Safe Reinforcement Learning (2026)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Modular Diffusion Policy Training: Decoupling And Recombining Guidance And Diffusion For Offline RL (2025)0.00
- Steering Your Diffusion Policy With Latent Space Reinforcement Learning (2025)0.00