Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning
2024 Β· Linjiajie Fang, Ruoxue Liu, Jing Zhang, et al.
Abstract
In offline reinforcement learning, it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. One class of methods, the policy-regularized method, addresses this problem by constraining the target policy to stay close to the behavior policy. Although several approaches suggest representing the behavior policy as an expressive diffusion model to boost performance, it remains unclear how to regularize the target policy given a diffusion-modeled behavior sampler. In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models. Our approach follows the actor-critic learning paradigm in which we alternatively train a diffusion-modeled target policy and a critic network. The actor training loss includes a soft Q-guidance term from the Q-gradient. The soft Q-guidance is based on the
Authors
(none)
Tags
Stats
Related papers
- Enhanced DACER Algorithm With High Diffusion Efficiency (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- IDQL: Implicit Q-learning As An Actor-critic Method With Diffusion Policies (2023)0.00
- Learning A Diffusion Model Policy From Rewards Via Q-score Matching (2023)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58