A Simple Solution For Offline Imitation From Observations And Examples With Possibly Incomplete Trajectories
2023 Β· Kai Yan, Alexander G. Schwing, Yu-Xiong Wang
Abstract
Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectorie
Authors
(none)
Tags
Stats
Related papers
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Offline Imitation Learning By Controlling The Effective Planning Horizon (2024)0.00
- Mitigating Covariate Shift In Imitation Learning Via Offline Data Without Great Coverage (2021)0.00
- DITTO: Offline Imitation Learning With World Models (2023)0.00
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- Offline Imitation Learning With Suboptimal Demonstrations Via Relaxed Distribution Matching (2023)6.77
- Lobsdice: Offline Learning From Observation Via Stationary Distribution Correction Estimation (2022)0.00
- Safemil: Learning Offline Safe Imitation Policy From Non-preferred Trajectories (2025)0.00