Diffcps: Diffusion Model Based Constrained Policy Search For Offline Reinforcement Learning

Abstract

Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is intractable since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach, \(\textbf\{Diffusion-based Constrained Policy Search\}\) (dubbed DiffCPS), which tackles the diffusion-based constrained policy search with the primal-dual method. The theoretical analysis reveals that strong duality holds for diffusion-based CPS problems, and upon introducing parameter approximation, an approximated solution can be obtained after \(\mathcal\{O\}(1/\epsilon)\) number of dual iter

Diffcps: Diffusion Model Based Constrained Policy Search For Offline Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers