Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards
2025 Β· Hanping Zhang, Yuhong Guo
Abstract
Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints, as exceeding safety violation limits can result in severe consequences. In this paper, we propose a novel safe RL approach called Safety Modulated Policy Optimization (SMPO), which enables safe policy function learning within the standard policy optimization framework through safety modulated rewards. In particular, we consider safety violation costs as feedback from the RL environments that are parallel to the standard awards, and introduce a Q-cost function as safety critic to estimate expected future cumulative costs. Then we propose to modulate the rewards using a cost-aware weighting function, which is carefully designed to ensure the safety limits based on the estimation of the safety critic, while maximizing the expected rewards. The policy function and the safety critic are simultaneously learned through gradient descent du
Authors
(none)
Tags
Stats
Related papers
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Enhancing Efficiency Of Safe Reinforcement Learning Via Sample Manipulation (2024)0.00
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Safe Policy Optimization With Local Generalized Linear Function Approximations (2021)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00
- Constraint-conditioned Policy Optimization For Versatile Safe Reinforcement Learning (2023)0.00
- Safe-support Q-learning: Learning Without Unsafe Exploration (2026)0.00