Hyperparameter Selection Methods For Fitted Q-evaluation With Error Guarantee
2022 Β· Kohei Miyaguchi
Abstract
We are concerned with the problem of hyperparameter selection for the fitted Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline policy evaluation (OPE), which is essential to the reinforcement learning without environment simulators. However, like other OPE methods, FQE is not hyperparameter-free itself and that undermines the utility in real-life applications. We address this issue by proposing a framework of approximate hyperparameter selection (AHS) for FQE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as distribution-mismatch tolerance and time complexity. We also confirm in experiments that the error bound given by the theory matches empirical observations.
Authors
(none)
Tags
Stats
Related papers
- Fitted Q Evaluation Without Bellman Completeness Via Stationary Weighting (2025)0.00
- Off-policy Fitted Q-evaluation With Differentiable Function Approximators: Z-estimation And Inference Theory (2022)0.00
- Towards Hyperparameter-free Policy Selection For Offline Reinforcement Learning (2021)0.00
- Bootstrapping Fitted Q-evaluation For Off-policy Inference (2021)0.00
- A Complete Characterization Of Linear Estimators For Offline Policy Evaluation (2022)0.00
- Robust Fitted-q-evaluation And Iteration Under Sequentially Exogenous Unobserved Confounders (2023)0.00
- Hyperparameter Optimization Can Even Be Harmful In Off-policy Learning And How To Deal With It (2024)0.00
- State-action Similarity-based Representations For Off-policy Evaluation (2023)1.20