Exploratory Diffusion Model For Unsupervised Reinforcement Learning
2025 Β· Chengyang Ying, Huayu Chen, Xinning Zhou, et al.
Abstract
Unsupervised reinforcement learning (URL) aims to pre-train agents by exploring diverse states or skills in reward-free environments, facilitating efficient adaptation to downstream tasks. As the agent cannot access extrinsic rewards during unsupervised exploration, existing methods design intrinsic rewards to model the explored data and encourage further exploration. However, the explored data are always heterogeneous, posing the requirements of powerful representation abilities for both intrinsic reward models and pre-trained policies. In this work, we propose the Exploratory Diffusion Model (ExDM), which leverages the strong expressive ability of diffusion models to fit the explored data, simultaneously boosting exploration and providing an efficient initialization for downstream tasks. Specifically, ExDM can accurately estimate the distribution of collected data in the replay buffer with the diffusion model and introduces the score-based intrinsic reward, encouraging the agent to e
Authors
(none)
Tags
Stats
Related papers
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- Exploration By Random Distribution Distillation (2025)0.00
- Interpretable Learning Dynamics In Unsupervised Reinforcement Learning (2025)0.00
- Diffusion Models For Reinforcement Learning: A Survey (2023)5.64
- Goal-driven Reward By Video Diffusion Models For Reinforcement Learning (2025)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- DEIR: Efficient And Robust Exploration Through Discriminative-model-based Episodic Intrinsic Rewards (2023)0.00