Offline Imitation Learning With Suboptimal Demonstrations Via Relaxed Distribution Matching
2023 Β· Lantao Yu, Tianhe Yu, Jiaming Song, et al.
Abstract
Offline imitation learning (IL) promises the ability to learn performant policies from pre-collected demonstrations without interactions with the environment. However, imitating behaviors fully offline typically requires numerous expert data. To tackle this issue, we study the setting where we have limited expert data and supplementary suboptimal data. In this case, a well-known issue is the distribution shift between the learned policy and the behavior policy that collects the offline data. Prior works mitigate this issue by regularizing the KL divergence between the stationary state-action distributions of the learned policy and the behavior policy. We argue that such constraints based on exact distribution matching can be overly conservative and hamper policy learning, especially when the imperfect offline data is highly suboptimal. To resolve this issue, we present RelaxDICE, which employs an asymmetrically-relaxed f-divergence for explicit support regularization. Specifically, ins
Authors
(none)
Tags
Stats
Related papers
- Mitigating Covariate Shift In Imitation Learning Via Offline Data Without Great Coverage (2021)0.00
- Softdice For Imitation Learning: Rethinking Off-policy Distribution Matching (2021)0.00
- Lobsdice: Offline Learning From Observation Via Stationary Distribution Correction Estimation (2022)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Strictly Batch Imitation Learning By Energy-based Distribution Matching (2020)0.00
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Optidice: Offline Policy Optimization Via Stationary Distribution Correction Estimation (2021)0.00
- Offline Imitation Learning By Controlling The Effective Planning Horizon (2024)0.00