Advantage-guided Diffusion For Model-based Reinforcement Learning
2026 Β· Daniele Foffano, Arvid Eriksson, David Broman, et al.
Abstract
Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent's advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window. We develop two guides: (i) Sigmoid Advantage Guidance (SAG) and (ii) Exponential Advantage Guidance (EAG). We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage-implying policy improvement under standard assumptions. Additionally, we show that the trajectories generated
Authors
(none)
Tags
Stats
Related papers
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00
- Policy-guided Diffusion (2024)0.00
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- World Models Via Policy-guided Trajectory Diffusion (2023)0.00
- Diffusion Models For Reinforcement Learning: A Survey (2023)5.64
- Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies (2025)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Modular Diffusion Policy Training: Decoupling And Recombining Guidance And Diffusion For Offline RL (2025)0.00