Sleepernets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents
2024 Β· Ethan Rathbun, Christopher Amato, Alina Oprea
Abstract
Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic
Authors
(none)
Tags
Stats
Related papers
- Adversarial Inception Backdoor Attacks Against Reinforcement Learning (2024)0.00
- Beyond Training-time Poisoning: Component-level And Post-training Backdoors In Deep Reinforcement Learning (2025)0.00
- Reward Poisoning In Reinforcement Learning: Attacks Against Unknown Learners In Unknown Environments (2021)0.00
- Efficient Reward Poisoning Attacks On Online Deep Reinforcement Learning (2022)0.00
- Black-box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning (2023)0.00
- Vulnerability-aware Poisoning Mechanism For Online RL With Unknown Dynamics (2020)0.00
- Policy Teaching In Reinforcement Learning Via Environment Poisoning Attacks (2020)0.00
- Online Poisoning Attack Against Reinforcement Learning Under Black-box Environments (2024)0.00