Continuous Mean-zero Disagreement-regularized Imitation Learning (CMZ-DRIL)
2024 Β· Noah Ford, Ryan W. Gardner, Austin Juhl, et al.
Abstract
Machine-learning paradigms such as imitation learning and reinforcement learning can generate highly performant agents in a variety of complex environments. However, commonly used methods require large quantities of data and/or a known reward function. This paper presents a method called Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) that employs a novel reward structure to improve the performance of imitation-learning agents that have access to only a handful of expert demonstrations. CMZ-DRIL uses reinforcement learning to minimize uncertainty among an ensemble of agents trained to model the expert demonstrations. This method does not use any environment-specific rewards, but creates a continuous and mean-zero reward function from the action disagreement of the agent ensemble. As demonstrated in a waypoint-navigation environment and in two MuJoCo environments, CMZ-DRIL can generate performant agents that behave more similarly to the expert than primary pr
Authors
(none)
Tags
Stats
Related papers
- RLIF: Interactive Imitation Learning As Reinforcement Learning (2023)0.00
- Primal Wasserstein Imitation Learning (2020)0.00
- Distance-rank Aware Sequential Reward Learning For Inverse Reinforcement Learning With Sub-optimal Demonstrations (2023)0.00
- On Discovering Algorithms For Adversarial Imitation Learning (2025)0.00
- Discriminator-actor-critic: Addressing Sample Inefficiency And Reward Bias In Adversarial Imitation Learning (2018)0.00
- Co-imitation Learning Without Expert Demonstration (2021)0.00
- Continual Reinforcement Learning In 3D Non-stationary Environments (2019)0.00
- Discriminator Soft Actor Critic Without Extrinsic Rewards (2020)3.58