Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation
2021 Β· Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, et al.
Abstract
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited to safety-critical domains. Existing sample complexity guarantees for offline value function approximation methods typically require both (1) distributional assumptions (i.e., good coverage) and (2) representational assumptions (i.e., ability to represent some or all \(Q\)-value functions) stronger than what is required for supervised learning. However, the necessity of these conditions and the fundamental limits of offline RL are not well understood in spite of decades of research. This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and re
Authors
(none)
Tags
Stats
Related papers
- What Are The Statistical Limits Of Offline RL With Linear Function Approximation? (2020)0.00
- Is Value Learning Really The Main Bottleneck In Offline RL? (2024)0.00
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- Optimal Conservative Offline RL With General Function Approximation Via Augmented Lagrangian (2022)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- What Can Online Reinforcement Learning With Function Approximation Benefit From General Coverage Conditions? (2023)0.00
- Offline Reinforcement Learning Under Value And Density-ratio Realizability: The Power Of Gaps (2022)0.00
- Offline Reinforcement Learning: Role Of State Aggregation And Trajectory Data (2024)0.00