More Efficient Off-policy Evaluation Through Regularized Targeted Learning
2019 · Aurélien F. Bibaut, Ivana Malenica, Nikos Vlassis, et al.
Abstract
We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their \(O_P(1/\sqrt\{n\})\) rate of convergence and characterizing their asymptotic distribution.
Authors
(none)
Tags
Stats
Related papers
- Intrinsically Efficient, Stable, And Bounded Off-policy Evaluation For Reinforcement Learning (2019)0.00
- Double Reinforcement Learning For Efficient Off-policy Evaluation In Markov Decision Processes (2019)0.00
- Off-policy Evaluation In Doubly Inhomogeneous Environments (2023)7.16
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Kernel Metric Learning For In-sample Off-policy Evaluation Of Deterministic RL Policies (2024)0.00
- Empirical Study Of Off-policy Policy Evaluation For Reinforcement Learning (2019)0.00
- Efficiently Breaking The Curse Of Horizon In Off-policy Evaluation With Double Reinforcement Learning (2019)10.21
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00