A New Framework For Query Efficient Active Imitation Learning
2019 Β· Daniel Hsu
Abstract
We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pair
Authors
(none)
Tags
Stats
Related papers
- Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-player Competitive Games (2022)0.00
- Generative Adversarial Imitation Learning (2016)0.00
- Blending Imitation And Reinforcement Learning For Robust Policy Improvement (2023)0.00
- Adversarial Imitation Learning Via Random Search (2020)7.16
- State-only Imitation With Transition Dynamics Mismatch (2020)0.00
- Provably Efficient Adversarial Imitation Learning With Unknown Transitions (2023)0.00
- Bayesian Robust Optimization For Imitation Learning (2020)0.00
- RLIF: Interactive Imitation Learning As Reinforcement Learning (2023)0.00