Constraints Penalized Q-learning For Safe Offline Reinforcement Learning
2021 Β· Haoran Xu, Xianyuan Zhan, Xiangyu Zhu
Abstract
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that na\"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly
Authors
(none)
Tags
Stats
Related papers
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- State-constrained Offline Reinforcement Learning (2024)0.00
- ACL-QL: Adaptive Conservative Level In Q-learning For Offline Reinforcement Learning (2024)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00
- Constraint-adaptive Policy Switching For Offline Safe Reinforcement Learning (2024)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00