Bootstrapping Fitted Q-evaluation For Off-policy Inference
2021 Β· Botao Hao, Xiang Ji, Yaqi Duan, et al.
Abstract
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
Authors
(none)
Tags
Stats
Related papers
- Statistical Bootstrapping For Uncertainty Estimation In Off-policy Evaluation (2020)0.00
- Off-policy Fitted Q-evaluation With Differentiable Function Approximators: Z-estimation And Inference Theory (2022)0.00
- Bootstrapping With Models: Confidence Intervals For Off-policy Evaluation (2016)9.23
- Fitted Q Evaluation Without Bellman Completeness Via Stationary Weighting (2025)0.00
- Intrinsically Efficient, Stable, And Bounded Off-policy Evaluation For Reinforcement Learning (2019)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00