Beware Untrusted Simulators -- Reward-free Backdoor Attacks In Reinforcement Learning
2026 Β· Ethan Rathbun, Wo Wei Lin, Alina Oprea, et al.
Abstract
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined ``trigger'', leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent's training pipeline, enabling them to both alter and observe agent's rewards. As these assumptions are infeasible to implement within a simulator, we propo
Authors
(none)
Tags
Stats
Related papers
- Adversarial Inception Backdoor Attacks Against Reinforcement Learning (2024)0.00
- Recover Triggered States: Protect Model Against Backdoor Attack In Reinforcement Learning (2023)0.00
- Beyond Training-time Poisoning: Component-level And Post-training Backdoors In Deep Reinforcement Learning (2025)0.00
- BAFFLE: Hiding Backdoors In Offline Reinforcement Learning Datasets (2022)6.34
- Sleepernets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents (2024)0.00
- Policycleanse: Backdoor Detection And Mitigation In Reinforcement Learning (2022)0.00
- Backdoor Attacks On Multiagent Collaborative Systems (2022)0.00
- A Spatiotemporal Stealthy Backdoor Attack Against Cooperative Multi-agent Deep Reinforcement Learning (2024)0.00