What Are The Statistical Limits Of Offline RL With Linear Function Approximation?

Abstract

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of *every* policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spect

What Are The Statistical Limits Of Offline RL With Linear Function Approximation?

Abstract

Authors

Tags

Stats

Related papers