Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors
2020 Β· Jingliang Duan, Yang Guan, Shengbo Eben Li, et al.
Abstract
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q-value overestimations because it is capable of adaptively adjusting the update stepsize of the Q-value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state-action returns within a reasonable range to address exploding and vanishing gra
Authors
(none)
Tags
Stats
Related papers
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- Value-distributional Model-based Reinforcement Learning (2023)1.56
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- Revisiting Discrete Soft Actor-critic (2022)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00