Imitation Learning From Observations By Minimizing Inverse Dynamics Disagreement
2019 Β· Chao Yang, Xiaojian Ma, Wenbing Huang, et al.
Abstract
This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchma
Authors
(none)
Tags
Stats
Related papers
- Lobsdice: Offline Learning From Observation Via Stationary Distribution Correction Estimation (2022)0.00
- DEALIO: Data-efficient Adversarial Learning For Imitation From Observation (2021)5.24
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Good Better Best: Self-motivated Imitation Learning For Noisy Demonstrations (2023)0.00
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- Provably Efficient Imitation Learning From Observation Alone (2019)0.00
- Causal Transfer For Imitation Learning And Decision Making Under Sensor-shift (2020)5.84
- On The Sample Efficiency Of Inverse Dynamics Models For Semi-supervised Imitation Learning (2026)0.00