A Diffusion Model Framework For Maximum Entropy Reinforcement Learning
2025 Β· Sebastian Sanokowski, Kaustubh Patil, Alois Knoll
Abstract
Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem. We tackle this problem by minimizing the reverse Kullback-Leibler (KL) divergence between the diffusion policy and the optimal policy distribution using a tractable upper bound. By applying the policy gradient theorem to this objective, we derive a modified surrogate objective for MaxEntRL that incorporates diffusion dynamics in a principled way. This leads to simple diffusion-based variants of Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) and Wasserstein Policy Optimization (WPO), termed DiffSAC, DiffPPO and DiffWPO. All of these methods require only minor implementation changes to their base algorithm. We find that on standard continuous control benchmarks, DiffSAC, DiffPPO and DiffWPO achieve bette
Authors
(none)
Tags
Stats
Related papers
- Maximum Entropy Inverse Reinforcement Learning Of Diffusion Models With Energy-based Models (2024)0.00
- Sampling From Energy-based Policies Using Diffusion (2024)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)10.85
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Maximum Entropy Diverse Exploration: Disentangling Maximum Entropy Reinforcement Learning (2019)0.00