Towards Safe Reinforcement Learning Via Constraining Conditional Value-at-risk
2022 Β· Chengyang Ying, Xinning Zhou, Hang Su, et al.
Abstract
Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optim
Authors
(none)
Tags
Stats
Related papers
- Constraint-conditioned Policy Optimization For Versatile Safe Reinforcement Learning (2023)0.00
- Acrel: Adversarial Conditional Value-at-risk Reinforcement Learning (2021)0.00
- Extreme Risk Mitigation In Reinforcement Learning Using Extreme Value Theory (2023)0.00
- Provably Efficient Iterated Cvar Reinforcement Learning With Function Approximation And Human Feedback (2023)0.00
- Robust Risk-sensitive Reinforcement Learning With Conditional Value-at-risk (2024)5.84
- Implicit Safe Set Algorithm For Provably Safe Reinforcement Learning (2024)0.00
- Improving Robustness Via Risk Averse Distributional Reinforcement Learning (2020)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24