A Complete Characterization Of Linear Estimators For Offline Policy Evaluation
2022 Β· Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, et al.
Abstract
Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle problems with complex, high-dimensional observations, there has been significant interest from theoreticians and practitioners alike in understanding the possibility of function approximation in reinforcement learning. Despite significant study, a sharp characterization of when we might expect offline policy evaluation to be tractable, even in the simplest setting of linear function approximation, has so far remained elusive, with a surprising number of strong negative results recently appearing in the literature. In this work, we identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods, in particular Fitted Q-iteration (FQI) and least squares temporal difference learning (LSTD), to su
Authors
(none)
Tags
Stats
Related papers
- What Are The Statistical Limits Of Offline RL With Linear Function Approximation? (2020)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- Infinite-horizon Offline Reinforcement Learning With Linear Function Approximation: Curse Of Dimensionality And Algorithm (2021)0.00
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Pessimistic Nonlinear Least-squares Value Iteration For Offline Reinforcement Learning (2023)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Near-optimal Provable Uniform Convergence In Offline Policy Evaluation For Reinforcement Learning (2020)0.00