Diffcps: Diffusion Model Based Constrained Policy Search For Offline Reinforcement Learning
2023 Β· Longxiang He, Li Shen, Linrui Zhang, et al.
Abstract
Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is intractable since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach, \(\textbf\{Diffusion-based Constrained Policy Search\}\) (dubbed DiffCPS), which tackles the diffusion-based constrained policy search with the primal-dual method. The theoretical analysis reveals that strong duality holds for diffusion-based CPS problems, and upon introducing parameter approximation, an approximated solution can be obtained after \(\mathcal\{O\}(1/\epsilon)\) number of dual iter
Authors
(none)
Tags
Stats
Related papers
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning (2024)2.92
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00
- Diffpogan: Diffusion Policies With Generative Adversarial Networks For Offline Reinforcement Learning (2024)0.00
- Contractive Diffusion Policies: Robust Action Diffusion Via Contractive Score-based Sampling With Differential Equations (2026)0.00
- Boosting Continuous Control With Consistency Policy (2023)3.58