Mimicking Better By Matching The Approximate Action Distribution
2023 · João A. Cândido Ramos, Lionel Blondé, Naoya Takeishi, et al.
Abstract
In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert's state-state transitions; we regularize the imitator's policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance l
Authors
(none)
Tags
Stats
Related papers
- On Discovering Algorithms For Adversarial Imitation Learning (2025)0.00
- Online Adaptation For Enhancing Imitation Learning Policies (2024)0.00
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00
- Action Inference By Maximising Evidence: Zero-shot Imitation From Observation With World Models (2023)2.29
- Deterministic And Discriminative Imitation (d2-imitation): Revisiting Adversarial Imitation For Sample Efficiency (2021)0.00
- A New Framework For Query Efficient Active Imitation Learning (2019)0.00
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- A Simple Solution For Offline Imitation From Observations And Examples With Possibly Incomplete Trajectories (2023)0.00