Feasible Policy Iteration For Safe Reinforcement Learning
2023 Β· Yujie Yang, Zhilong Zheng, Shengbo Eben Li, et al.
Abstract
Safety is the priority concern when applying reinforcement learning (RL) algorithms to real-world control problems. While policy iteration provides a fundamental algorithm for standard RL, an analogous theoretical algorithm for safe RL remains absent. In this paper, we propose feasible policy iteration (FPI), the first foundational dynamic programming algorithm for safe RL. FPI alternates between policy evaluation, region identification and policy improvement. This follows actor-critic-scenery (ACS) framework where scenery refers to a feasibility function that represents a feasible region. A region-wise update rule is developed for the policy improvement step, which maximizes state-value function inside the feasible region and minimizes feasibility function outside it. With this update rule, FPI guarantees monotonic expansion of feasible region, monotonic improvement of state-value function, and geometric convergence to the optimal safe policy. Experimental results demonstrate that FPI
Authors
(none)
Tags
Stats
Related papers
- Policy Bifurcation In Safe Reinforcement Learning (2024)0.00
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Implicit Safe Set Algorithm For Provably Safe Reinforcement Learning (2024)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Safe Policy Optimization With Local Generalized Linear Function Approximations (2021)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00