Breaking The Sample Size Barrier In Model-based Reinforcement Learning With A Generative Model

Abstract

This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider \(\gamma\)-discounted infinite-horizon Markov decision processes (MDPs) with state space \(\mathcal\{S\}\) and action space \(\mathcal\{A\}\). Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least \(\frac\{|\mathcal\{S\}||\mathcal\{A\}|\}\{(1-\gamma)^2\}\). The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of \(\frac\{|\mathcal\{S\}||\mathcal\{A\}|\}\{1-\gamma\}\) (modulo some log fa

Breaking The Sample Size Barrier In Model-based Reinforcement Learning With A Generative Model

Abstract

Authors

Tags

Stats

Related papers