Online Learning Of Deceptive Policies Under Intermittent Observation
2025 Β· Gokul Puthumanaillam, Ram Padmanabhan, Jose Fuentes, et al.
Abstract
In supervisory control settings, autonomous systems are not monitored continuously. Instead, monitoring often occurs at sporadic intervals within known bounds. We study the problem of deception, where an agent pursues a private objective while remaining plausibly compliant with a supervisor's reference policy when observations occur. Motivated by the behavior of real, human supervisors, we situate the problem within Theory of Mind: the representation of what an observer believes and expects to see. We show that Theory of Mind can be repurposed to steer online reinforcement learning (RL) toward such deceptive behavior. We model the supervisor's expectations and distill from them a single, calibrated scalar -- the expected evidence of deviation if an observation were to happen now. This scalar combines how unlike the reference and current action distributions appear, with the agent's belief that an observation is imminent. Injected as a state-dependent weight into a KL-regularized policy
Authors
(none)
Tags
Stats
Related papers
- Deceptive Sequential Decision-making Via Regularized Policy Optimization (2025)0.00
- When Your Ais Deceive You: Challenges Of Partial Observability In Reinforcement Learning From Human Feedback (2024)0.00
- Deceptive Reinforcement Learning In Model-free Domains (2023)3.58
- Deception In Social Learning: A Multi-agent Reinforcement Learning Perspective (2021)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- On The Structural Non-preservation Of Epistemic Behaviour Under Policy Transformation (2026)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Online Robust Policy Learning In The Presence Of Unknown Adversaries (2018)0.00