Safe Reinforcement Learning Using Action Projection: Safeguard The Policy Or The Environment?
2025 Β· Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, et al.
Abstract
Projection-based safety filters, which modify unsafe actions by mapping them to the closest safe alternative, are widely used to enforce safety constraints in reinforcement learning (RL). Two integration strategies are commonly considered: Safe environment RL (SE-RL), where the safeguard is treated as part of the environment, and safe policy RL (SP-RL), where it is embedded within the policy through differentiable optimization layers. Despite their practical relevance in safety-critical settings, a formal understanding of their differences is lacking. In this work, we present a theoretical comparison of SE-RL and SP-RL. We identify a key distinction in how each approach is affected by action aliasing, a phenomenon in which multiple unsafe actions are projected to the same safe action, causing information loss in the policy gradients. In SE-RL, this effect is implicitly approximated by the critic, while in SP-RL, it manifests directly as rank-deficient Jacobians during backpropagation t
Authors
(none)
Tags
Stats
Related papers
- Safe Reinforcement Learning Via Projection On A Safe Set: How To Achieve Optimality? (2020)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Safe Reinforcement Learning In Black-box Environments Via Adaptive Shielding (2024)2.26
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- Safety Aware Reinforcement Learning (SARL) (2020)0.00
- Enhancing Efficiency Of Safe Reinforcement Learning Via Sample Manipulation (2024)0.00