Provably Efficient Generative Adversarial Imitation Learning For Online And Offline Setting With Linear Function Approximation
2021 Β· Zhihan Liu, Yufeng Zhang, Zuyue Fu, et al.
Abstract
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves \(\widetilde\{\mathcal\{O\}\}(H^2 d^\{3/2\}K^\{1/2\}+KH^\{3/2\}dN_1^\{-1/2\})\) regret. Here \(N_1\) represents the number of trajectories of the expert demonstration, \(d\) is the feature dimension, and \(K\) is the number of episodes. For offline GAIL, we propose a
Authors
(none)
Tags
Stats
Related papers
- When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence (2020)0.00
- C-GAIL: Stabilizing Generative Adversarial Imitation Learning With Control Theory (2024)0.00
- Non-adversarial Imitation Learning And Its Connections To Adversarial Methods (2020)0.00
- Provably Efficient Adversarial Imitation Learning With Unknown Transitions (2023)0.00
- Provably Efficient Off-policy Adversarial Imitation Learning With Convergence Guarantees (2024)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- \(f\)-gail: Learning \(f\)-divergence For Generative Adversarial Imitation Learning (2020)0.00