RAT: Adversarial Attacks On Deep Reinforcement Agents For Targeted Behaviors
2024 Β· Fengshuo Bai, Runze Liu, Yali Du, et al.
Abstract
Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker's objectives, often bypassing traditional reward-based defenses. Prior methods have primarily focused on reducing cumulative rewards; however, rewards are typically too generic to capture complex safety requirements effectively. As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. RAT trains an intention policy that is explicitly aligned with human preferences, serving as a precise behavioral target for the adversary. Concurrently, an adversary manipulates the victim's policy to follow this target behavior. To enhance the effective
Authors
(none)
Tags
Stats
Related papers
- Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation (2024)0.00
- Query-based Targeted Action-space Adversarial Policies On Deep Reinforcement Learning Agents (2020)0.00
- Real-time Adversarial Perturbations Against Deep Reinforcement Learning Policies: Attacks And Defenses (2021)0.00
- Trojdrl: Trojan Attacks On Deep Reinforcement Learning Agents (2019)0.00
- Adversarial Policies: Attacking Deep Reinforcement Learning (2019)0.00
- Attacking And Defending Deep Reinforcement Learning Policies (2022)0.00
- Tactics Of Adversarial Attack On Deep Reinforcement Learning Agents (2017)17.32
- Robust Deep Reinforcement Learning Through Adversarial Attacks And Training : A Survey (2024)0.00