Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model
2021 Β· Bingyan Wang, Yuling Yan, Jianqing Fan
Abstract
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space \(\mathcal\{S\}\) and the action space \(\mathcal\{A\}\) are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with \(|\mathcal\{S\}|\times|\mathcal\{A\}|\), which can be prohibitively large when \(\mathcal\{S\}\) or \(\mathcal\{A\}\) is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.\(~\)Q-learning) provably learns an \(\epsilon\)-optimal policy (resp.\(~\)Q-function) with high probability as soon as the sample size exceeds the order of \(\frac\{K\}\{(1-\gamma)^\{3\}\epsilon^\{2\}\}\) (resp.\(~\)\(\frac\{K\}\{(1-\gamma)^\{4\}\epsilon^\{2\}\}\)), up to some logarithmic factor. Here \(K\) is
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Breaking The Sample Size Barrier In Model-based Reinforcement Learning With A Generative Model (2020)9.03
- Projection By Convolution: Optimal Sample Complexity For Reinforcement Learning In Continuous-space Mdps (2024)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Model-based Reinforcement Learning With A Generative Model Is Minimax Optimal (2019)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00