Variance-aware Off-policy Evaluation With Linear Function Approximation
2021 Β· Yifei Min, Tianhao Wang, Dongruo Zhou, et al.
Abstract
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.
Authors
(none)
Tags
Stats
Related papers
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- A Maximum-entropy Approach To Off-policy Evaluation In Average-reward Mdps (2020)0.00
- Accelerated And Instance-optimal Policy Evaluation With Linear Function Approximation (2021)0.00
- More Efficient Off-policy Evaluation Through Regularized Targeted Learning (2019)0.00
- An Instrumental Variable Approach To Confounded Off-policy Evaluation (2022)0.00
- Variational Latent Branching Model For Off-policy Evaluation (2023)0.00
- Near-optimal Offline Reinforcement Learning With Linear Representation: Leveraging Variance Information With Pessimism (2022)0.00
- Future-dependent Value-based Off-policy Evaluation In Pomdps (2022)0.00