Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning

Abstract

Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with \(S\) states, \(A\) actions and horizon length \(H\), substantial progress has been achieved towards characterizing the minimax-optimal regret, which scales on the order of \(\sqrt\{H^2SAT\}\) (modulo log factors) with \(T\) the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g., \(S^6A^4 \,\mathrm\{poly\}(H)\) for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity \(O(SAH)\), that achieves near-optimal regret as soon as the sample size exceeds the order of \(SA\,\mathrm\{poly\}(H)\). In terms of this sam

Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers