Goal-conditioned Imitation Learning Using Score-based Diffusion Policies
2023 Β· Moritz Reuss, Maximilian Li, Xiaogang Jia, et al.
Abstract
We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "\(\textbf\{BE\}\)havior generation with \(\textbf\{S\}\)c\(\textbf\{O\}\)re-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additio
Authors
(none)
Tags
Stats
Related papers
- Equivariant Diffusion Policy (2024)0.00
- Learning A Diffusion Model Policy From Rewards Via Q-score Matching (2023)0.00
- Contractive Diffusion Policies: Robust Action Diffusion Via Contractive Score-based Sampling With Differential Equations (2026)0.00
- Don't Start From Scratch: Behavioral Refinement Via Interpolant-based Policy Diffusion (2024)9.28
- Policy-guided Diffusion (2024)0.00
- Learning To Reach Goals Via Diffusion (2023)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Reward-directed Score-based Diffusion Models Via Q-learning (2024)0.00