Offline Imitation Learning By Controlling The Effective Planning Horizon
2024 Β· Hee-Jun Ahn, Seong-Woong Shim, Byung-Jun Lee
Abstract
In offline imitation learning (IL), we generally assume only a handful of expert trajectories and a supplementary offline dataset from suboptimal behaviors to learn the expert policy. While it is now common to minimize the divergence between state-action visitation distributions so that the agent also considers the future consequences of an action, a sampling error in an offline dataset may lead to erroneous estimates of state-action visitations in the offline case. In this paper, we investigate the effect of controlling the effective planning horizon (i.e., reducing the discount factor) as opposed to imposing an explicit regularizer, as previously studied. Unfortunately, it turns out that the existing algorithms suffer from magnified approximation errors when the effective planning horizon is shortened, which results in a significant degradation in performance. We analyze the main cause of the problem and provide the right remedies to correct the algorithm. We show that the corrected
Authors
(none)
Tags
Stats
Related papers
- Mitigating Covariate Shift In Imitation Learning Via Offline Data Without Great Coverage (2021)0.00
- A Simple Solution For Offline Imitation From Observations And Examples With Possibly Incomplete Trajectories (2023)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- Offline Imitation Learning With Suboptimal Demonstrations Via Relaxed Distribution Matching (2023)6.77
- Is Behavior Cloning All You Need? Understanding Horizon In Imitation Learning (2024)0.00
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- Proximal Point Imitation Learning (2022)0.00
- Minimax Optimal Online Imitation Learning Via Replay Estimation (2022)0.00