Adversarial Soft Advantage Fitting: Imitation Learning Without Policy Optimization
2020 Β· Paul Barde, Julien Roy, Wonseok Jeon, et al.
Abstract
Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and com
Authors
(none)
Tags
Stats
Related papers
- Generative Adversarial Imitation Learning (2016)0.00
- Preventing Imitation Learning With Adversarial Policy Ensembles (2020)0.00
- Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-player Competitive Games (2022)0.00
- Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning (2020)7.81
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00
- Non-adversarial Imitation Learning And Its Connections To Adversarial Methods (2020)0.00
- Self-supervised Adversarial Imitation Learning (2023)0.00
- Online Adaptation For Enhancing Imitation Learning Policies (2024)0.00