Sampling From Energy-based Policies Using Diffusion
2024 Β· Vineet Jain, Tara Akhound-Sadegh, Siamak Ravanbakhsh
Abstract
Energy-based policies offer a flexible framework for modeling complex, multimodal behaviors in reinforcement learning (RL). In maximum entropy RL, the optimal policy is a Boltzmann distribution derived from the soft Q-function, but direct sampling from this distribution in continuous action spaces is computationally intractable. As a result, existing methods typically use simpler parametric distributions, like Gaussians, for policy representation -- limiting their ability to capture the full complexity of multimodal action distributions. In this paper, we introduce a diffusion-based approach for sampling from energy-based policies, where the negative Q-function defines the energy function. Based on this approach, we propose an actor-critic method called Diffusion Q-Sampling (DQS) that enables more expressive policy representations, allowing stable learning in diverse environments. We show that our approach enhances sample efficiency in continuous control tasks and captures multimodal b
Authors
(none)
Tags
Stats
Related papers
- A Diffusion Model Framework For Maximum Entropy Reinforcement Learning (2025)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Maximum Entropy Inverse Reinforcement Learning Of Diffusion Models With Energy-based Models (2024)0.00
- Direct Soft-policy Sampling Via Langevin Dynamics (2026)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- Learning A Diffusion Model Policy From Rewards Via Q-score Matching (2023)0.00