Sequence Model Imitation Learning With Unobserved Contexts
2022 Β· Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, et al.
Abstract
We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed. One example of this is when the expert has access to privileged information: while the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions, they might be able to eventually identify the hidden context and act as the expert would. We prove that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods. This is because on-policy algorithms provably learn to recover from their initially suboptimal actions, while off-policy methods treat their suboptimal past actions as though they came from the expert. This often manifests as a latching behavior: a naive repetition of past actions. We conduct experimen
Authors
(none)
Tags
Stats
Related papers
- Toward The Fundamental Limits Of Imitation Learning (2020)0.00
- Causal Imitation Learning With Unobserved Confounders (2022)0.00
- The Pitfalls Of Imitation Learning When Actions Are Continuous (2025)0.00
- Minimax Optimal Online Imitation Learning Via Replay Estimation (2022)0.00
- Causal Confusion In Imitation Learning (2019)0.00
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- Sequential Causal Imitation Learning With Unobserved Confounders (2022)0.00
- Causal Imitation Learning Under Expert-observable And Expert-unobservable Confounding (2025)0.00