Adaptive Doubly Robust Estimator From Non-stationary Logging Policy Under A Convergence Of Average Probability
2021 Β· Masahiro Kato
Abstract
Adaptive experiments, including efficient average treatment effect estimation and multi-armed bandit algorithms, have garnered attention in various applications, such as social experiments, clinical trials, and online advertisement optimization. This paper considers estimating the mean outcome of an action from samples obtained in adaptive experiments. In causal inference, the mean outcome of an action has a crucial role, and the estimation is an essential task, where the average treatment effect estimation and off-policy value estimation are its variants. In adaptive experiments, the probability of choosing an action (logging policy) is allowed to be sequentially updated based on past observations. Due to this logging policy depending on the past observations, the samples are often not independent and identically distributed (i.i.d.), making developing an asymptotically normal estimator difficult. A typical approach for this problem is to assume that the logging policy converges in a
Authors
(none)
Tags
Stats
Related papers
- Doubly Robust Interval Estimation For Optimal Policy Evaluation In Online Learning (2021)0.00
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- Log-sum-exponential Estimator For Off-policy Evaluation And Learning (2025)0.00
- Online Estimation And Inference For Robust Policy Evaluation In Reinforcement Learning (2023)2.26
- Logarithmic Smoothing For Adaptive Pac-bayesian Off-policy Learning (2025)0.00
- Logarithmic Smoothing For Pessimistic Off-policy Evaluation, Selection And Learning (2024)0.00
- Robust Fitted-q-evaluation And Iteration Under Sequentially Exogenous Unobserved Confounders (2023)0.00
- Off-policy Evaluation And Learning From Logged Bandit Feedback: Error Reduction Via Surrogate Policy (2018)0.00