Adversarially Trained Weighted Actor-critic For Safe Offline Reinforcement Learning
2024 Β· Honghao Wei, Xiyue Peng, Arnob Ghosh, et al.
Abstract
We propose WSAC (Weighted Safe Actor-Critic), a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation, which can robustly optimize policies to improve upon an arbitrary reference policy with limited data coverage. WSAC is designed as a two-player Stackelberg game to optimize a refined objective function. The actor optimizes the policy against two adversarially trained value critics with small importance-weighted Bellman errors, which focus on scenarios where the actor's performance is inferior to the reference policy. In theory, we demonstrate that when the actor employs a no-regret optimization oracle, WSAC achieves a number of guarantees: (i) For the first time in the safe offline RL setting, we establish that WSAC can produce a policy that outperforms any reference policy while maintaining the same level of safety, which is critical to designing a safe algorithm for offline RL. (ii) WSAC achieves the optimal statistical convergence rate of \(1/\
Authors
(none)
Tags
Stats
Related papers
- FAWAC: Feasibility Informed Advantage Weighted Regression For Persistent Safety In Offline Reinforcement Learning (2024)0.00
- Importance Weighted Actor-critic For Optimal Conservative Offline Reinforcement Learning (2023)0.00
- Optimization Solution Functions As Deterministic Policies For Offline Reinforcement Learning (2024)0.00
- Offline-boosted Actor-critic: Adaptively Blending Optimal Historical Behaviors In Deep Off-policy RL (2024)0.00
- Provable Benefits Of Actor-critic Methods For Offline Reinforcement Learning (2021)0.00
- BRAC+: Improved Behavior Regularized Actor Critic For Offline Reinforcement Learning (2021)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Wasserstein Barycenter Soft Actor-critic (2025)0.00