Constraint-adaptive Policy Switching For Offline Safe Reinforcement Learning
2024 Β· Yassine Chemingui, Aryan Deshwal, Honghao Wei, et al.
Abstract
Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at https://github.com/y
Authors
(none)
Tags
Stats
Related papers
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- FAWAC: Feasibility Informed Advantage Weighted Regression For Persistent Safety In Offline Reinforcement Learning (2024)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- State-constrained Offline Reinforcement Learning (2024)0.00
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Policy Constraint By Only Support Constraint For Offline Reinforcement Learning (2025)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58