Safe Policy Optimization With Local Generalized Linear Function Approximations
2021 Β· Akifumi Wachi, Yunyue Wei, Yanan Sui
Abstract
Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.
Authors
(none)
Tags
Stats
Related papers
- Enhancing Efficiency Of Safe Reinforcement Learning Via Sample Manipulation (2024)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Safety Modulation: Enhancing Safety In Reinforcement Learning Through Cost-modulated Rewards (2025)0.00
- Safe-support Q-learning: Learning Without Unsafe Exploration (2026)0.00
- Feasible Policy Iteration For Safe Reinforcement Learning (2023)0.00
- Provably Optimal Reinforcement Learning Under Safety Filtering (2025)0.00
- Low-switching Policy Gradient With Exploration Via Online Sensitivity Sampling (2023)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24