Certifiably Robust Reinforcement Learning Through Model-based Abstract Interpretation
2023 Β· Chenxi Yang, Greg Anderson, Swarat Chaudhuri
Abstract
We present a reinforcement learning (RL) framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signal is used to guide learning, and the abstract interpretation used to construct it directly leads to the robustness certificate returned at convergence. We give a theoretical analysis that bounds the worst-case accumulative reward of CAROL. We also experimentally evaluate CAROL on four MuJoCo environments with continuous state and action spaces. On these tasks, CAROL learns policies that, when contrasted with policies from the state-of-the-art robust RL algorithms, exhibit: (i) markedly enhanced certified performance lower bounds; and (ii) comparable performance under empirical adversarial atta
Authors
(none)
Tags
Stats
Related papers
- On The Robustness Of Safe Reinforcement Learning Under Observational Perturbations (2022)0.00
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- Robust Reinforcement Learning On State Observations With Learned Optimal Adversary (2021)0.00
- Safe Reinforcement Learning With Dual Robustness (2023)8.60
- Certifying Safety In Reinforcement Learning Under Adversarial Perturbation Attacks (2022)0.00
- CAMP In The Odyssey: Provably Robust Reinforcement Learning With Certified Radius Maximization (2025)0.00
- Policy Smoothing For Provably Robust Reinforcement Learning (2021)0.00
- Robust Adversarial Reinforcement Learning Via Bounded Rationality Curricula (2023)0.00