Fitted Q Evaluation Without Bellman Completeness Via Stationary Weighting
2025 Β· Lars van Der Laan, Nathan Kallus
Abstract
Fitted Q-evaluation (FQE) is a foundational method for off-policy evaluation in reinforcement learning, but existing theory typically relies on Bellman completeness of the function class, a condition often violated in practice. This reliance is due to a fundamental norm mismatch: the Bellman operator is gamma-contractive in the L^2 norm induced by the target policy's stationary distribution, whereas standard FQE fits Bellman regressions under the behavior distribution. To resolve this mismatch, we reweight each Bellman regression step by an estimate of the stationary density ratio, inspired by emphatic weighting in temporal-difference learning. This makes the update behave as if it were performed under the target stationary distribution, restoring contraction without Bellman completeness while preserving the simplicity of regression-based evaluation. Illustrative experiments, including Baird's classical counterexample, show that stationary weighting can stabilize FQE under off-policy s
Authors
(none)
Tags
Stats
Related papers
- Off-policy Fitted Q-evaluation With Differentiable Function Approximators: Z-estimation And Inference Theory (2022)0.00
- Hyperparameter Selection Methods For Fitted Q-evaluation With Error Guarantee (2022)0.00
- Bootstrapping Fitted Q-evaluation For Off-policy Inference (2021)0.00
- Minimax Weight And Q-function Learning For Off-policy Evaluation (2019)0.00
- Sample Complexity Of Nonparametric Off-policy Evaluation On Low-dimensional Manifolds Using Deep Networks (2022)0.00
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- State-action Similarity-based Representations For Off-policy Evaluation (2023)1.20
- Q* Approximation Schemes For Batch Reinforcement Learning: A Theoretical Comparison (2020)0.00