Wasserstein Barycenter Soft Actor-critic
2025 Β· Zahra Shahrooei, Ali Baheri
Abstract
Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.
Authors
(none)
Tags
Stats
Related papers
- Wasserstein Actor-critic: Directed Exploration Via Optimism For Continuous-actions Control (2023)2.26
- Improving Actor-critic Training With Steerable Action-value Approximation Errors (2024)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Langevin Soft Actor-critic: Efficient Exploration Through Uncertainty-driven Critic Learning (2025)0.00
- Stochastic Actor-critic: Mitigating Overestimation Via Temporal Aleatoric Uncertainty (2026)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- Monte Carlo Beam Search For Actor-critic Reinforcement Learning In Continuous Control (2025)0.00
- Tactical Optimism And Pessimism For Deep Reinforcement Learning (2021)0.00