Low Variance Off-policy Evaluation With State-based Importance Sampling
2022 Β· David M. Bossens, Philip S. Thomas
Abstract
In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data collected from a known behaviour policy. In this context, importance sampling estimators provide estimates for the expected return by weighting the trajectory based on the probability ratio of the target policy and the behaviour policy. Unfortunately, such estimators have a high variance and therefore a large mean squared error. This paper proposes state-based importance sampling estimators which reduce the variance by dropping certain states from the computation of the importance weight. To illustrate their applicability, we demonstrate state-based variants of ordinary importance sampling, weighted importance sampling, per-decision importance sampling, incremental importance sampling, doubly robust off-policy evaluation, and stationary density ratio estimat
Authors
(none)
Tags
Stats
Related papers
- Importance Sampling Policy Evaluation With An Estimated Behavior Policy (2018)0.00
- Sample Dropout: A Simple Yet Effective Variance Reduction Technique In Deep Policy Optimization (2023)0.00
- Behaviour Policy Optimization: Provably Lower Variance Return Estimates For Off-policy Reinforcement Learning (2025)0.00
- Relative Importance Sampling For Off-policy Actor-critic In Deep Reinforcement Learning (2018)2.26
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Counterfactual-augmented Importance Sampling For Semi-offline Policy Evaluation (2023)0.00
- Efficient Policy Evaluation With Safety Constraint For Reinforcement Learning (2024)0.00