Abstract

Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of \(\widetilde\{O\} (H^\{3/2\} |S|/\epsilon)\) and interaction complexity of \(\widetilde\{O\} (H^\{3\} |S|^2 |A|/\epsilon^2)\). Here, \(H\) represents the planning horizon, \(|S|\) is the state space size, \(|A|\) is the action space size, and \(\ep

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyxu2023provably

Related papers