Offline Safe Reinforcement Learning Using Trajectory Classification
2024 Β· Ze Gong, Akshat Kumar, Pradeep Varakantham
Abstract
Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints at each time step (derived from global cost constraints) and this can result in either overly conservative policies or violation of safety constraints. In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories. To be specific, we first partition the pre-collected dataset of state-action trajectories into desirable and undesirable subsets. Intuitively, the desirable set contains high reward and safe trajectories, and undesirable set contains unsafe trajectories and low-reward safe trajectories. Second, we learn a policy that generates desirable trajectories and avoids undesirable trajectories, where (un)desirability scores are provided by a classifier learnt from the dataset of desirable and u
Authors
(none)
Tags
Stats
Related papers
- Safe Offline Reinforcement Learning With Real-time Budget Constraints (2023)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- Safemil: Learning Offline Safe Imitation Policy From Non-preferred Trajectories (2025)0.00
- Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting (2023)0.00
- Provably Efficient Offline Reinforcement Learning With Trajectory-wise Reward (2022)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83
- Trajdeleter: Enabling Trajectory Forgetting In Offline Reinforcement Learning Agents (2024)0.00