Analytic Energy-guided Policy Optimization For Offline Reinforcement Learning
2025 Β· Jifeng Hu, Sili Huang, Zhejian Yang, et al.
Abstract
Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach the target estimation of log-expectation formulation. We apply our method in 30+ offline RL tasks to d
Authors
(none)
Tags
Stats
Related papers
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diffpogan: Diffusion Policies With Generative Adversarial Networks For Offline Reinforcement Learning (2024)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Sampling From Energy-based Policies Using Diffusion (2024)0.00
- Diffcps: Diffusion Model Based Constrained Policy Search For Offline Reinforcement Learning (2023)1.91
- Dichotomous Diffusion Policy Optimization (2025)0.00