Robust Fitted-q-evaluation And Iteration Under Sequentially Exogenous Unobserved Confounders
2023 Β· David Bruns-Smith, Angela Zhou
Abstract
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy's action decisions are observed. Though this assumption, sequential ignorability/unconfoundedness, likely does not hold in observational data, most of the data that accounts for selection into treatment may be observed, motivating sensitivity analysis. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to quantile estimation. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical
Authors
(none)
Tags
Stats
Related papers
- Confounding-robust Policy Evaluation In Infinite-horizon Reinforcement Learning (2020)0.00
- Doubly Robust Interval Estimation For Optimal Policy Evaluation In Online Learning (2021)0.00
- Online Estimation And Inference For Robust Policy Evaluation In Reinforcement Learning (2023)2.26
- Structured Difference-of-q Via Orthogonal Learning (2024)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- A Complete Characterization Of Linear Estimators For Offline Policy Evaluation (2022)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)0.00