A Few Expert Queries Suffices For Sample-efficient RL With Resets And Linear Value Approximation

Abstract

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires \(\tilde\{\mathcal\{O\}\}(d)\) expert queries and a \(\texttt\{poly\}(d,H,|\mathcal\{A\}|,1/\epsilon)\) amount of exploratory samples to provably recover an \(\epsilon\)-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning

A Few Expert Queries Suffices For Sample-efficient RL With Resets And Linear Value Approximation

Abstract

Authors

Tags

Stats

Related papers