Efficient Evaluation Of Natural Stochastic Policies In Offline Reinforcement Learning
2020 Β· Nathan Kallus, Masatoshi Uehara
Abstract
We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper, we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax
Authors
(none)
Tags
Stats
Related papers
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Intrinsically Efficient, Stable, And Bounded Off-policy Evaluation For Reinforcement Learning (2019)0.00
- Efficient Policy Evaluation With Safety Constraint For Reinforcement Learning (2024)0.00
- A Complete Characterization Of Linear Estimators For Offline Policy Evaluation (2022)0.00
- Counterfactual-augmented Importance Sampling For Semi-offline Policy Evaluation (2023)0.00
- Near-optimal Provable Uniform Convergence In Offline Policy Evaluation For Reinforcement Learning (2020)0.00
- Behaviour Policy Optimization: Provably Lower Variance Return Estimates For Off-policy Reinforcement Learning (2025)0.00