Provably Efficient Imitation Learning From Observation Alone
2019 Β· Wen Sun, Anirudh Vemula, Byron Boots, et al.
Abstract
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also investig
Authors
(none)
Tags
Stats
Related papers
- Imitation Learning From Observation Through Optimal Transport (2023)2.26
- Provably Efficient Adversarial Imitation Learning With Unknown Transitions (2023)0.00
- Toward The Fundamental Limits Of Imitation Learning (2020)0.00
- Proximal Point Imitation Learning (2022)0.00
- DEALIO: Data-efficient Adversarial Learning For Imitation From Observation (2021)5.24
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- Imitation Learning From Observation With Automatic Discount Scheduling (2023)0.00
- Towards Generalisable Imitation Learning Through Conditioned Transition Estimation And Online Behaviour Alignment (2026)0.00