Maximum Entropy Inverse Reinforcement Learning Of Diffusion Models With Energy-based Models
2024 Β· Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, et al.
Abstract
We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Diffusion by Maximum Entropy IRL (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DxDP), a novel reinforcement learning algorithm for diffusion models, as a subr
Authors
(none)
Tags
Stats
Related papers
- A Diffusion Model Framework For Maximum Entropy Reinforcement Learning (2025)0.00
- Sampling From Energy-based Policies Using Diffusion (2024)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Bellman Diffusion: Generative Modeling As Learning A Linear Operator In The Distribution Space (2024)0.00
- Kernel Based Maximum Entropy Inverse Reinforcement Learning For Mean-field Games (2025)0.00
- Analytic Energy-guided Policy Optimization For Offline Reinforcement Learning (2025)0.00
- Efficient Inference For Inverse Reinforcement Learning And Dynamic Discrete Choice Models (2025)0.00
- Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning (2026)0.00