Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation
2024 Β· Shojiro Yamabe, Kazuto Fukuchi, Jun Sakuma
Abstract
This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.
Authors
(none)
Tags
Stats
Related papers
- RAT: Adversarial Attacks On Deep Reinforcement Agents For Targeted Behaviors (2024)0.00
- Toward Evaluating Robustness Of Reinforcement Learning With Adversarial Policy (2023)4.52
- Adversarial Policies: Attacking Deep Reinforcement Learning (2019)0.00
- Attacking And Defending Deep Reinforcement Learning Policies (2022)0.00
- Optimal Attack And Defense For Reinforcement Learning (2023)6.34
- Query-based Targeted Action-space Adversarial Policies On Deep Reinforcement Learning Agents (2020)0.00
- Targeted Adversarial Attacks On Deep Reinforcement Learning Policies Via Model Checking (2022)2.26
- Understanding Adversarial Attacks On Observations In Deep Reinforcement Learning (2021)0.00