Diffusion World Model: Future Modeling Beyond Step-by-step Rollout For Offline Reinforcement Learning
2024 Β· Zihan Ding, Amy Zhang, Yuandong Tian, et al.
Abstract
We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive queries. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a \(44%\) performance gain, and is comparable to or slightly surpassing their model-free counterparts.
Authors
(none)
Tags
Stats
Related papers
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- Imagine-2-drive: Leveraging High-fidelity World Models Via Multi-modal Diffusion Policies (2024)0.00
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00
- DIAR: Diffusion-model-guided Implicit Q-learning With Adaptive Revaluation (2024)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Advantage-guided Diffusion For Model-based Reinforcement Learning (2026)0.00