Instrumental Variable Value Iteration For Causal Offline Reinforcement Learning
2021 Β· Luofeng Liao, Zuyue Fu, Zhuoran Yang, et al.
Abstract
In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.
Authors
(none)
Tags
Stats
Related papers
- Offline Reinforcement Learning With Instrumental Variables In Confounded Markov Decision Processes (2022)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- Offline RL With No OOD Actions: In-sample Learning Via Implicit Value Regularization (2023)0.00
- Pessimistic Nonlinear Least-squares Value Iteration For Offline Reinforcement Learning (2023)0.00
- Confounded Causal Imitation Learning With Instrumental Variables (2025)0.00
- Is Value Learning Really The Main Bottleneck In Offline RL? (2024)0.00
- Viper: Provably Efficient Algorithm For Offline RL With Neural Function Approximation (2023)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00