On Sample-efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, And Beyond
2024 Β· Thanh Nguyen-Tang, Raman Arora
Abstract
We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to \{unify\} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve *comparable* sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sampl
Authors
(none)
Tags
Stats
Related papers
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00
- On The Sample Complexity Of Vanilla Model-based Offline Reinforcement Learning With Dependent Samples (2023)2.26
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- Importance Of Empirical Sample Complexity Analysis For Offline Reinforcement Learning (2021)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Sample Complexity Of Offline Reinforcement Learning With Deep Relu Networks (2021)0.00
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83