S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic
2024 Β· Safa Messaoud, Billel Mokeddem, Zhenghai Xue, et al.
Abstract
Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open problem. To address this, previous MaxEnt RL methods either implicitly estimate the entropy, resulting in high computational complexity and variance (SQL), or follow a variational inference procedure that fits simplified actor distributions (e.g., Gaussian) for tractability (SAC). We propose Stein Soft Actor-Critic (S\(^2\)AC), a MaxEnt RL algorithm that learns expressive policies without compromising efficiency. Specifically, S\(^2\)AC uses parameterized Stein Variational Gradient Descent (SVGD) as the underlying policy. We derive a closed-form expression of the entropy of such policies. Our formu
Authors
(none)
Tags
Stats
Related papers
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- Do You Need The Entropy Reward (in Practice)? (2022)0.00
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)10.85
- Max-entropy Reinforcement Learning With Flow Matching And A Case Study On LQR (2025)0.00
- Off-policy Maximum Entropy Reinforcement Learning : Soft Actor-critic With Advantage Weighted Mixture Policy(sac-awmp) (2020)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00