Strictly Batch Imitation Learning By Energy-based Distribution Matching
2020 Β· Daniel Jarrett, Ioana Bica, Mihaela van Der Schaar
Abstract
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly learn from rollout dynamics (i.e. leveraging state marginals), and -- crucially -- operate in an entirely offline fashion. To address this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a pol
Authors
(none)
Tags
Stats
Related papers
- Offline Imitation Learning With Suboptimal Demonstrations Via Relaxed Distribution Matching (2023)6.77
- Softdice For Imitation Learning: Rethinking Off-policy Distribution Matching (2021)0.00
- Efficient Offline Reinforcement Learning: First Imitate, Then Improve (2024)1.91
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Sampling From Energy-based Policies Using Diffusion (2024)0.00
- OPIRL: Sample Efficient Off-policy Inverse Reinforcement Learning Via Distribution Matching (2021)0.00
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00