Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models
2024 Β· Zeyu Fang, Tian Lan
Abstract
Generative models such as diffusion have been employed as world models in offline reinforcement learning to generate synthetic data for more effective learning. Existing work either generates diffusion models one-time prior to training or requires additional interaction data to update it. In this paper, we propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation. It iteratively leverages a guided diffusion world model to directly evaluate the offline target policy with actions drawn from it, and then performs an importance-sampled world model update to adaptively align the world model with the updated policy. We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy. The result sheds light on various factors affecting learning performance. Evaluations in the D4RL environment show significant improvement over state-of
Authors
(none)
Tags
Stats
Related papers
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffpogan: Diffusion Policies With Generative Adversarial Networks For Offline Reinforcement Learning (2024)0.00
- Diffusion World Model: Future Modeling Beyond Step-by-step Rollout For Offline Reinforcement Learning (2024)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Diffusion Models For Reinforcement Learning: A Survey (2023)5.64
- Diwa: Diffusion Policy Adaptation With World Models (2025)0.00
- Continual Offline Reinforcement Learning Via Diffusion-based Dual Generative Replay (2024)0.00