Diffusion-dice: In-sample Diffusion Guidance For Offline Reinforcement Learning
2024 Β· Liyuan Mao, Haoran Xu, Xianyuan Zhan, et al.
Abstract
One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term. Due to the multi-modality contained in the optimal policy distribution, the transformation in Diffusion-DICE may guide towards those local-optimal modes. We thus generate a few ca
Authors
(none)
Tags
Stats
Related papers
- ODICE: Revealing The Mystery Of Distribution Correction Estimation Via Orthogonal-gradient Update (2024)0.00
- Dualdice: Behavior-agnostic Estimation Of Discounted Stationary Distribution Corrections (2019)0.00
- Gradientdice: Rethinking Generalized Offline Estimation Of Stationary Values (2020)0.00
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Optidice: Offline Policy Optimization Via Stationary Distribution Correction Estimation (2021)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Simudice: Offline Policy Optimization Through World Model Updates And DICE Estimation (2024)0.00