Policy-guided Diffusion
2024 Β· Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, et al.
Abstract
In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-policy experience. However, in practice, model rollouts must be severely truncated to avoid compounding error. As an alternative, we propose policy-guided diffusion. Our method uses diffusion models to generate entire trajectories under the behavior distribution, applying guidance from the target policy to move synthetic experience further on-policy. We show that policy-guided diffusion models a regularized form of the target distribution that balances action likelihood under both the target and behavior policies, leading to plausible trajectories with high target policy probability, while retain
Authors
(none)
Tags
Stats
Related papers
- World Models Via Policy-guided Trajectory Diffusion (2023)0.00
- Don't Start From Scratch: Behavioral Refinement Via Interpolant-based Policy Diffusion (2024)9.28
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Streaming Diffusion Policy: Fast Policy Synthesis With Variable Noise Diffusion Models (2024)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Advantage-guided Diffusion For Model-based Reinforcement Learning (2026)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81