Provably Safe PAC-MDP Exploration Using Analogies
2020 Β· Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
Abstract
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compa
Authors
(none)
Tags
Stats
Related papers
- Probabilistic Counterexample Guidance For Safer Reinforcement Learning (extended Version) (2023)0.00
- DOPE: Doubly Optimistic And Pessimistic Exploration For Safe Reinforcement Learning (2021)0.00
- Actsafe: Active Exploration With Safety Constraints For Reinforcement Learning (2024)0.00
- Dyna-style Safety Augmented Reinforcement Learning: Staying Safe In The Face Of Uncertainty (2026)0.00
- Implicit Safe Set Algorithm For Provably Safe Reinforcement Learning (2024)0.00
- Safe Policy Optimization With Local Generalized Linear Function Approximations (2021)0.00
- Safe Multi-agent Reinforcement Learning With Convergence To Generalized Nash Equilibrium (2024)0.00
- Safe Reinforcement Learning For Constrained Markov Decision Processes With Stochastic Stopping Time (2024)2.26