Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning
2024 Β· Hanye Zhao, Xiaoshen Han, Zhengbang Zhu, et al.
Abstract
With the great success of diffusion models (DMs) in generating realistic synthetic vision data, many researchers have investigated their potential in decision-making and control. Most of these works utilized DMs to sample directly from the trajectory space, where DMs can be viewed as a combination of dynamics models and policies. In this work, we explore how to decouple DMs' ability as dynamics models in fully offline settings, allowing the learning policy to roll out trajectories. As DMs learn the data distribution from the dataset, their intrinsic policy is actually the behavior policy induced from the dataset, which results in a mismatch between the behavior policy and the learning policy. We propose Dynamics Diffusion, short as DyDiff, which can inject information from the learning policy to DMs iteratively. DyDiff ensures long-horizon rollout accuracy while maintaining policy consistency and can be easily deployed on model-free algorithms. We provide theoretical analysis to show t
Authors
(none)
Tags
Stats
Related papers
- Diffusion World Model: Future Modeling Beyond Step-by-step Rollout For Offline Reinforcement Learning (2024)0.00
- Madiff: Offline Multi-agent Learning With Diffusion Models (2023)2.26
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Bitrajdiff: Bidirectional Trajectory Generation With Diffusion Models For Offline Reinforcement Learning (2025)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00