Offline Reinforcement Learning With Instrumental Variables In Confounded Markov Decision Processes
2022 Β· Zuyue Fu, Zhengling Qi, Zhaoran Wang, et al.
Abstract
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical stud
Authors
(none)
Tags
Stats
Related papers
- Instrumental Variable Value Iteration For Causal Offline Reinforcement Learning (2021)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- Reinforcement Learning With Continuous Actions Under Unmeasured Confounding (2025)0.00
- Strategic Decision-making In The Presence Of Information Asymmetry: Provably Efficient RL With Algorithmic Instruments (2022)0.00
- Mutual Information Regularized Offline Reinforcement Learning (2022)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data (2025)0.00