World Models Via Policy-guided Trajectory Diffusion
2023 Β· Marc Rigter, Jun Yamada, Ingmar Posner
Abstract
World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in "in imagination". Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy. Prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We analyse the connections between PolyGRAD, score-based generative models, and classifier-guided diffu
Authors
(none)
Tags
Stats
Related papers
- Policy-guided Diffusion (2024)0.00
- Imagine-2-drive: Leveraging High-fidelity World Models Via Multi-modal Diffusion Policies (2024)0.00
- Advantage-guided Diffusion For Model-based Reinforcement Learning (2026)0.00
- Recurrent World Models Facilitate Policy Evolution (2018)0.00
- Low-variance Policy Gradient Estimation With World Models (2020)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00