Execute Order 66: Targeted Data Poisoning For Reinforcement Learning
2022 Β· Harrison Foley, Liam Fowl, Tom Goldstein, et al.
Abstract
Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.
Authors
(none)
Tags
Stats
Related papers
- Online Poisoning Attack Against Reinforcement Learning Under Black-box Environments (2024)0.00
- Black-box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning (2023)0.00
- Efficient Reward Poisoning Attacks On Online Deep Reinforcement Learning (2022)0.00
- Policy Teaching In Reinforcement Learning Via Environment Poisoning Attacks (2020)0.00
- Reward Poisoning In Reinforcement Learning: Attacks Against Unknown Learners In Unknown Environments (2021)0.00
- Trojdrl: Trojan Attacks On Deep Reinforcement Learning Agents (2019)0.00
- Poisoning Deep Reinforcement Learning Agents With In-distribution Triggers (2021)0.00
- Policy Poisoning In Batch Reinforcement Learning And Control (2019)0.00