Certifying Safety In Reinforcement Learning Under Adversarial Perturbation Attacks
2022 Β· Junlin Wu, Hussein Sibai, Yevgeniy Vorobeychik
Abstract
Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maxim
Authors
(none)
Tags
Stats
Related papers
- On The Robustness Of Safe Reinforcement Learning Under Observational Perturbations (2022)0.00
- On Assessing The Safety Of Reinforcement Learning Algorithms Using Formal Methods (2021)0.00
- Safe Reinforcement Learning With Dual Robustness (2023)8.60
- Robust Deep Reinforcement Learning Against Adversarial Perturbations On State Observations (2020)0.00
- Provably Invincible Adversarial Attacks On Reinforcement Learning Systems: A Rate-distortion Information-theoretic Approach (2025)0.00
- Regret-based Defense In Adversarial Reinforcement Learning (2023)0.00
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- Robust Reinforcement Learning On State Observations With Learned Optimal Adversary (2021)0.00