Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting
2021 Β· Gen Li, Yuxin Chen, Yuejie Chi, et al.
Abstract
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the linear realizability of the optimal Q-function (also called the "linear \(Q^\{\star\}\) problem"). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available but is unfortunately infeasible when we turn to standard online RL settings. In this paper, we make progress towards understanding this linear \(Q^\{\star\}\) problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and
Authors
(none)
Tags
Stats
Related papers
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Online RL In Linearly \(q^\pi\)-realizable Mdps Is As Easy As In Linear Mdps If You Learn What To Ignore (2023)0.00
- Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model (2021)0.00
- Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (2021)0.00
- Improved Sample Complexity For Reward-free Reinforcement Learning Under Low-rank Mdps (2023)0.00
- Offline Reinforcement Learning Under Value And Density-ratio Realizability: The Power Of Gaps (2022)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Distributionally Robust Online Markov Game With Linear Function Approximation (2025)0.00