DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty
2025 Β· Mingxuan Cui, Duo Zhou, Yuxuan Han, et al.
Abstract
Deep reinforcement learning (RL) has achieved remarkable success, yet its deployment in real-world scenarios is often limited by vulnerability to environmental uncertainties. Distributionally robust RL (DR-RL) algorithms have been proposed to resolve this challenge, but existing approaches are largely restricted to value-based methods in tabular settings. In this work, we introduce Distributionally Robust Soft Actor-Critic (DR-SAC), the first actor-critic based DR-RL algorithm for offline learning in continuous action spaces. DR-SAC maximizes the entropy-regularized rewards against the worst possible transition models within an KL-divergence constrained uncertainty set. We derive the distributionally robust version of the soft policy iteration with a convergence guarantee and incorporate a generative modeling approach to estimate the unknown nominal transition models. Experiment results on five continuous RL tasks demonstrate our algorithm achieves up to 9.8 times higher average reward
Authors
(none)
Tags
Stats
Related papers
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00