Provably Efficient Generative Adversarial Imitation Learning For Online And Offline Setting With Linear Function Approximation

Abstract

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves \(\widetilde\{\mathcal\{O\}\}(H^2 d^\{3/2\}K^\{1/2\}+KH^\{3/2\}dN_1^\{-1/2\})\) regret. Here \(N_1\) represents the number of trajectories of the expert demonstration, \(d\) is the feature dimension, and \(K\) is the number of episodes. For offline GAIL, we propose a

Provably Efficient Generative Adversarial Imitation Learning For Online And Offline Setting With Linear Function Approximation

Abstract

Authors

Tags

Stats

Related papers