Enhancing Efficiency Of Safe Reinforcement Learning Via Sample Manipulation
2024 Β· Shangding Gu, Laixi Shi, Yuhao Ding, et al.
Abstract
Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization
Authors
(none)
Tags
Stats
Related papers
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- Safe Policy Optimization With Local Generalized Linear Function Approximations (2021)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- Safety Correction From Baseline: Towards The Risk-aware Policy In Robotics Via Dual-agent Reinforcement Learning (2022)3.58
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00