Autocost: Evolving Intrinsic Cost For Zero-violation Reinforcement Learning
2023 Β· Tairan He, Weiye Zhao, Changliu Liu
Abstract
Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the
Authors
(none)
Tags
Stats
Related papers
- Imitate The Good And Avoid The Bad: An Incremental Approach To Safe Reinforcement Learning (2023)0.00
- Controlling Underestimation Bias In Constrained Reinforcement Learning For Safe Exploration (2026)0.00
- Conservative And Adaptive Penalty For Model-based Safe Reinforcement Learning (2021)0.00
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00
- Solving Richly Constrained Reinforcement Learning Through State Augmentation And Reward Penalties (2023)0.00
- Implicit Safe Set Algorithm For Provably Safe Reinforcement Learning (2024)0.00