DITTO: Offline Imitation Learning With World Models
2023 Β· Branton Demoss, Paul Duckworth, Jakob Foerster, et al.
Abstract
For imitation learning algorithms to scale to real-world challenges, they must handle high-dimensional observations, offline learning, and policy-induced covariate-shift. We propose DITTO, an offline imitation learning algorithm which addresses all three of these problems. DITTO optimizes a novel distance metric in the latent space of a learned world model: First, we train a world model on all available trajectory data, then, the imitation agent is unrolled from expert start states in the learned model, and penalized for its latent divergence from the expert dataset over multiple time steps. We optimize this multi-step latent divergence using standard reinforcement learning algorithms, which provably induces imitation learning, and empirically achieves state-of-the art performance and sample efficiency on a range of Atari environments from pixels, without any online environment access. We also adapt other standard imitation learning algorithms to the world model setting, and show that
Authors
(none)
Tags
Stats
Related papers
- A Simple Solution For Offline Imitation From Observations And Examples With Possibly Incomplete Trajectories (2023)0.00
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Simudice: Offline Policy Optimization Through World Model Updates And DICE Estimation (2024)0.00
- Learning From Random Demonstrations: Offline Reinforcement Learning With Importance-sampled Diffusion Models (2024)0.00
- Offline Vs. Online Learning In Model-based RL: Lessons For Data Collection Strategies (2025)0.00
- Mitigating Covariate Shift In Imitation Learning Via Offline Data Without Great Coverage (2021)0.00
- Offline Trajectory Optimization For Offline Reinforcement Learning (2024)1.20
- Offline Imitation Learning By Controlling The Effective Planning Horizon (2024)0.00