Fully General Online Imitation Learning
2021 Β· Michael K. Cohen, Marcus Hutter, Neel Nanda
Abstract
In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. In general, one mistake during learning can lead to completely different events. In the special setting of environments that restart, existing work provides formal guidance in how to imitate so that events unfold similarly, but outside that setting, no formal guidance exists. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes, and we allow our imitator to learn online from the demonstrator. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted
Authors
(none)
Tags
Stats
Related papers
- A Bayesian Solution To The Imitation Gap (2024)0.00
- Minimax Optimal Online Imitation Learning Via Replay Estimation (2022)0.00
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Invariant Causal Imitation Learning For Generalizable Policies (2023)0.00
- Bayesian Robust Optimization For Imitation Learning (2020)0.00
- Generative Adversarial Imitation Learning (2016)0.00
- The Pitfalls Of Imitation Learning When Actions Are Continuous (2025)0.00
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00