Sample Complexity Of Nonparametric Off-policy Evaluation On Low-dimensional Manifolds Using Deep Networks
2022 Β· Xiang Ji, Minshuo Chen, Mengdi Wang, et al.
Abstract
We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data are generated from an unknown behavior policy. We show that, by choosing network size appropriately, one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Specifically, we establish a sharp error bound for fitted Q-evaluation, which depends on the intrinsic dimension of the state-action space, the smoothness of Bellman operator, and a function class-restricted \(\chi^2\)-divergence. It is noteworthy that the restricted \(\chi^2\)-divergence measures the behavior and target policies' \{\it mismatch in the function space\}, which can be small even if the two policies are not close to each other in their tabular
Authors
(none)
Tags
Stats
Related papers
- Off-policy Fitted Q-evaluation With Differentiable Function Approximators: Z-estimation And Inference Theory (2022)0.00
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- On The Expressivity Of Neural Networks For Deep Reinforcement Learning (2019)0.00
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Interpretable Off-policy Evaluation In Reinforcement Learning By Highlighting Influential Transitions (2020)0.00
- Fitted Q Evaluation Without Bellman Completeness Via Stationary Weighting (2025)0.00