Deterministic And Discriminative Imitation (d2-imitation): Revisiting Adversarial Imitation For Sample Efficiency
2021 Β· Mingfei Sun, Sam Devlin, Katja Hofmann, et al.
Abstract
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a determi
Authors
(none)
Tags
Stats
Related papers
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00
- Mimicking Better By Matching The Approximate Action Distribution (2023)0.00
- Softdice For Imitation Learning: Rethinking Off-policy Distribution Matching (2021)0.00
- On The Sample Efficiency Of Inverse Dynamics Models For Semi-supervised Imitation Learning (2026)0.00
- Adversarial Soft Advantage Fitting: Imitation Learning Without Policy Optimization (2020)0.00
- Provably Efficient Off-policy Adversarial Imitation Learning With Convergence Guarantees (2024)0.00
- DEALIO: Data-efficient Adversarial Learning For Imitation From Observation (2021)5.24
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00