Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning
2024 Β· Tianyu Chen, Zhendong Wang, Mingyuan Zhou
Abstract
Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tried to accelerate diffusion-QL, the improvement in training and/or inference speed often results in degraded performance. In this paper, we introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy. We bridge the two polices by a newly introduced diffusion trust region loss. The diffusion policy maintains expressiveness, while the trust region loss directs the one-step policy to explore freely and seek modes within the region defined by the diffusion policy. DTQL eliminates the need for ite
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- One-step Flow Q-learning: Addressing The Diffusion Policy Bottleneck In Offline Reinforcement Learning (2025)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- How Does The Lagrangian Guide Safe Reinforcement Learning Through Diffusion Models? (2026)0.00