Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning
2021 Β· Gen Li, Laixi Shi, Yuxin Chen, et al.
Abstract
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with \(S\) states, \(A\) actions and horizon length \(H\), substantial progress has been achieved towards characterizing the minimax-optimal regret, which scales on the order of \(\sqrt\{H^2SAT\}\) (modulo log factors) with \(T\) the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g., \(S^6A^4 \,\mathrm\{poly\}(H)\) for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity \(O(SAH)\), that achieves near-optimal regret as soon as the sample size exceeds the order of \(SA\,\mathrm\{poly\}(H)\). In terms of this sam
Authors
(none)
Tags
Stats
Related papers
- Optimistic Posterior Sampling For Reinforcement Learning With Few Samples And Tight Guarantees (2022)0.00
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Sample Efficient Reinforcement Learning With Partial Dynamics Knowledge (2023)3.58
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- A Provably Efficient Sample Collection Strategy For Reinforcement Learning (2020)0.00
- Why Is Posterior Sampling Better Than Optimism For Reinforcement Learning? (2016)0.00
- Minimax Regret Bounds For Reinforcement Learning (2017)0.00