Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning
2021 Β· Yecheng Jason Ma, Andrew Shen, Osbert Bastani, et al.
Abstract
Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Fur
Authors
(none)
Tags
Stats
Related papers
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- CUP: A Conservative Update Policy Algorithm For Safe Reinforcement Learning (2022)0.00
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Safety Correction From Baseline: Towards The Risk-aware Policy In Robotics Via Dual-agent Reinforcement Learning (2022)3.58
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Context-aware Safe Reinforcement Learning For Non-stationary Environments (2021)9.76