Deceptive Sequential Decision-making Via Regularized Policy Optimization
2025 Β· Yerin Kim, Alexander Benvenuti, Bo Chen, et al.
Abstract
Autonomous systems are increasingly expected to operate in the presence of adversaries, though adversaries may infer sensitive information simply by observing a system. Therefore, present a deceptive sequential decision-making framework that not only conceals sensitive information, but actively misleads adversaries about it. We model autonomous systems as Markov decision processes, with adversaries using inverse reinforcement learning to recover reward functions. To counter them, we present three regularization strategies for policy synthesis problems that actively deceive an adversary about a system's reward. ``Diversionary deception'' leads an adversary to draw any false conclusion about the system's reward function. ``Targeted deception'' leads an adversary to draw a specific false conclusion about the system's reward function. ``Equivocal deception'' leads an adversary to infer that the real reward and a false reward both explain the system's behavior. We show how each form of dece
Authors
(none)
Tags
Stats
Related papers
- Online Learning Of Deceptive Policies Under Intermittent Observation (2025)0.00
- Deceptive Reinforcement Learning In Model-free Domains (2023)3.58
- Understanding Adversarial Attacks On Observations In Deep Reinforcement Learning (2021)0.00
- Online Robust Policy Learning In The Presence Of Unknown Adversaries (2018)0.00
- Toward Evaluating Robustness Of Reinforcement Learning With Adversarial Policy (2023)4.52
- Robust Deep Reinforcement Learning Against Adversarial Behavior Manipulation (2024)0.00
- Optimistic Policy Learning Under Pessimistic Adversaries With Regret And Violation Guarantees (2026)0.00
- Preventing Imitation Learning With Adversarial Policy Ensembles (2020)0.00