Policy Bifurcation In Safe Reinforcement Learning
2024 Β· Wenjun Zou, Yao Lyu, Jie Li, et al.
Abstract
Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations. We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of policy bifurcation in safe RL, which corresponds to the contractibility of the reachable tuple. Our theorem reveals that in scenarios where the obstacle-free state space is non-simply connected, a feasible policy is required to be bifurcated, meaning its output action needs to change abruptly in response to the varying state. To train such a bifurcated policy, we propose a safe RL algorithm cal
Authors
(none)
Tags
Stats
Related papers
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Feasible Policy Iteration For Safe Reinforcement Learning (2023)0.00
- Safe Reinforcement Learning With Dual Robustness (2023)8.60
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Safe Continual Reinforcement Learning In Non-stationary Environments (2026)12.89
- Joint Learning Of Policy With Unknown Temporal Constraints For Safe Reinforcement Learning (2023)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00