Adaptive Discounting Of Training Time Attacks
2024 Β· Ridhima Bector, Abhay Aradhya, Chai Quek, et al.
Abstract
Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behaviours that could be naturally adopted by the victim if not for a particular feature of the environment dynamics, which C-TTAs exploit. In this work, we show that a C-TTA is possible even when the target behaviour is un-adoptable due to both environment dynamics as well as non-optimality with respect to the victim objective(s). To find efficient attacks in this context, we develop a specialised flavour of the DDPG algorithm, which we term gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically alters the attack policy planning horizon based on the victim's current behaviour
Authors
(none)
Tags
Stats
Related papers
- Beyond Training-time Poisoning: Component-level And Post-training Backdoors In Deep Reinforcement Learning (2025)0.00
- Adversarial Inception Backdoor Attacks Against Reinforcement Learning (2024)0.00
- Trojdrl: Trojan Attacks On Deep Reinforcement Learning Agents (2019)0.00
- Tactics Of Adversarial Attack On Deep Reinforcement Learning Agents (2017)17.32
- Black-box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning (2023)0.00
- Reinforcement Learning Under Threats (2018)9.59
- RAT: Adversarial Attacks On Deep Reinforcement Agents For Targeted Behaviors (2024)0.00
- Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation (2024)0.00