Settling The Horizon-dependence Of Sample Complexity In Reinforcement Learning
2021 Β· Yuanzhi Li, Ruosong Wang, Lin F. Yang
Abstract
Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length \(H\), previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an \(O(1)\)-optimal policy using \(\mathrm\{polylog\}(H)\) episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the \(\mathrm\{polylog\}(H)\) dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only \(O(1)\) episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of ind
Authors
(none)
Tags
Stats
Related papers
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Nearly Horizon-free Offline Reinforcement Learning (2021)0.00
- The Effective Horizon Explains Deep RL Performance In Stochastic Environments (2023)3.42
- A Policy Gradient Approach For Finite Horizon Constrained Markov Decision Processes (2022)3.58
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Reward-free RL Is No Harder Than Reward-aware RL In Linear Markov Decision Processes (2022)0.00
- Episodic Reinforcement Learning In Finite Mdps: Minimax Lower Bounds Revisited (2020)0.00