Distributional Soft Actor-critic With Diffusion Policy
2025 Β· Tong Liu, Yinuo Wang, Xujie Song, et al.
Abstract
Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by generating a set of reward samples through reverse sampling using a diffusion model. Based on this, a dis
Authors
(none)
Tags
Stats
Related papers
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning (2024)2.92
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Continuous Control Reinforcement Learning: Distributed Distributional Drq Algorithms (2024)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Value-distributional Model-based Reinforcement Learning (2023)1.56