Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor
2018 Β· Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, et al.
Abstract
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control b
Authors
(none)
Tags
Stats
Related papers
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)10.85
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic (2024)2.41
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization (2024)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00