Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting

Abstract

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the linear realizability of the optimal Q-function (also called the "linear \(Q^\{\star\}\) problem"). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available but is unfortunately infeasible when we turn to standard online RL settings. In this paper, we make progress towards understanding this linear \(Q^\{\star\}\) problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and

Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting

Abstract

Authors

Tags

Stats

Related papers