Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model

Abstract

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space \(\mathcal\{S\}\) and the action space \(\mathcal\{A\}\) are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with \(|\mathcal\{S\}|\times|\mathcal\{A\}|\), which can be prohibitively large when \(\mathcal\{S\}\) or \(\mathcal\{A\}\) is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.\(~\)Q-learning) provably learns an \(\epsilon\)-optimal policy (resp.\(~\)Q-function) with high probability as soon as the sample size exceeds the order of \(\frac\{K\}\{(1-\gamma)^\{3\}\epsilon^\{2\}\}\) (resp.\(~\)\(\frac\{K\}\{(1-\gamma)^\{4\}\epsilon^\{2\}\}\)), up to some logarithmic factor. Here \(K\) is

Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model

Abstract

Authors

Tags

Stats

Related papers