Provably Efficient Causal Reinforcement Learning With Confounded Observational Data
2020 Β· Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
Abstract
Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes. However, learning expressive function approximators requires collecting a large dataset (interventional data) by interacting with the environment. Such a lack of sample efficiency prohibits the application of DRL to critical scenarios, e.g., autonomous driving and personalized medicine, since trial and error in the online setting is often unsafe and even unethical. In this paper, we study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. To incorporate the possibly confounded observational data, we propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner. More specifically, DOVI explicitly adjusts for the confounding bias
Authors
(none)
Tags
Stats
Related papers
- Causal Deep Reinforcement Learning Using Observational Data (2022)5.84
- Causal Reinforcement Learning Using Observational And Interventional Data (2021)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Pessimistic Causal Reinforcement Learning With Mediators For Confounded Offline Data (2024)0.00
- Reccover: Detecting Causal Confusion For Explainable Reinforcement Learning (2022)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Instrumental Variable Value Iteration For Causal Offline Reinforcement Learning (2021)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00