Is Plug-in Solver Sample-efficient For Feature-based Reinforcement Learning?
2020 Β· Qiwen Cui, Lin F. Yang
Abstract
It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an \(\epsilon\)-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an \(\epsilon\)-optimal policy in a \(\gamma\)-discounted MDP is \(O(K/(1-\gamma)^3\epsilon^2)\), which only depends on the dimensionality \(K\) of the feature space and has no dependence on the state or action space. We further extend our results to
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning With A Generative Model Is Minimax Optimal (2019)0.00
- Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model (2021)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (2021)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Projection By Convolution: Optimal Sample Complexity For Reinforcement Learning In Continuous-space Mdps (2024)0.00
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00