Infinite-horizon Offline Reinforcement Learning With Linear Function Approximation: Curse Of Dimensionality And Algorithm
2021 Β· Lin Chen, Bruno Scherrer, Peter L. Bartlett
Abstract
In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime \(d\gamma^\{2\}>1\), where \(d\) is the dimension of the feature vector and \(\gamma\) is the discount rate. In this regime, for any \(q\in[\gamma^\{2\},1]\), we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is \(q/d\) and it requires \(Ξ©\left(\frac\{d\}\{\gamma^\{2\}\left(q-\gamma^\{2\}\right)\epsilon^\{2\}\}\exp\left(\Theta\left(d\gamma^\{2\}\right)\right)\right)\) samples to approximate the value function up to an additive error \(\epsilon\). Note that the lower bound of the sample complexity is exponential in \(d\). If \(q=\gamma^\{2\}\), even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most \(O\left(\max\left\\{ \frac\{\
Authors
(none)
Tags
Stats
Related papers
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- What Are The Statistical Limits Of Offline RL With Linear Function Approximation? (2020)0.00
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- Sample Complexity Of Offline Reinforcement Learning With Deep Relu Networks (2021)0.00
- High-probability Sample Complexities For Policy Evaluation With Linear Function Approximation (2023)0.00
- A Complete Characterization Of Linear Estimators For Offline Policy Evaluation (2022)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00