Enhanced DACER Algorithm With High Diffusion Efficiency
2025 Β· Yinuo Wang, Likun Wang, Mining Tan, et al.
Abstract
Due to their expressive capacity, diffusion models have shown great promise in offline RL and imitation learning. Diffusion Actor-Critic with Entropy Regulator (DACER) extended this capability to online RL by using the reverse diffusion process as a policy approximator, achieving state-of-the-art performance. However, it still suffers from a core trade-off: more diffusion steps ensure high performance but reduce efficiency, while fewer steps degrade performance. This remains a major bottleneck for deploying diffusion policies in real-time online RL. To mitigate this, we propose DACERv2, which leverages a Q-gradient field objective with respect to action as an auxiliary optimization target to guide the denoising process at each diffusion step, thereby introducing intermediate supervisory signals that enhance the efficiency of single-step diffusion. Additionally, we observe that the independence of the Q-gradient field from the diffusion time step is inconsistent with the characteristics
Authors
(none)
Tags
Stats
Related papers
- Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning (2024)2.92
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- DIAR: Diffusion-model-guided Implicit Q-learning With Adaptive Revaluation (2024)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00