Adversarial Inception Backdoor Attacks Against Reinforcement Learning
2024 Β· Ethan Rathbun, Alina Oprea, Christopher Amato
Abstract
Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. The objectives of these attacks are twofold: induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks assume arbitrary control over the agent's rewards, inducing values far outside the environment's natural constraints. This results in brittle attacks that fail once the proper reward constraints are enforced. Thus, in this work we propose a new class of backdoor attacks against DRL which are the first to achieve state of the art performance under strict reward constraints. These "inception" attacks manipulate the agent's training data -- inserting the trigger into prior observations and replacing high return actions with those of the targeted adversarial behavior. We formally define these attacks and prove they
Authors
(none)
Tags
Stats
Related papers
- Beyond Training-time Poisoning: Component-level And Post-training Backdoors In Deep Reinforcement Learning (2025)0.00
- Efficient Reward Poisoning Attacks On Online Deep Reinforcement Learning (2022)0.00
- Sleepernets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents (2024)0.00
- Trojdrl: Trojan Attacks On Deep Reinforcement Learning Agents (2019)0.00
- Black-box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning (2023)0.00
- Reward Poisoning In Reinforcement Learning: Attacks Against Unknown Learners In Unknown Environments (2021)0.00
- Snooping Attacks On Deep Reinforcement Learning (2019)2.26
- Real-time Adversarial Perturbations Against Deep Reinforcement Learning Policies: Attacks And Defenses (2021)0.00