Semi-supervised Off Policy Reinforcement Learning
2020 Β· Aaron Sonabend-W, Nilanjana Laha, Ashwin N. Ananthakrishnan, et al.
Abstract
Reinforcement learning (RL) has shown great success in estimating sequential treatment strategies which take into account patient heterogeneity. However, health-outcome information, which is used as the reward for reinforcement learning methods, is often not well coded but rather embedded in clinical notes. Extracting precise outcome information is a resource intensive task, so most of the available well-annotated cohorts are small. To address this issue, we propose a semi-supervised learning (SSL) approach that efficiently leverages a small sized labeled data with true outcome observed, and a large unlabeled data with outcome surrogates. In particular, we propose a semi-supervised, efficient approach to Q-learning and doubly robust off policy value estimation. Generalizing SSL to sequential treatment regimes brings interesting challenges: 1) Feature distribution for Q-learning is unknown as it includes previous outcomes. 2) The surrogate variables we leverage in the modified SSL frame
Authors
(none)
Tags
Stats
Related papers
- Federated Offline Reinforcement Learning (2022)0.00
- Clinician-in-the-loop Decision Making: Reinforcement Learning With Near-optimal Set-valued Policies (2020)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Reinforcement Learning In Dynamic Treatment Regimes Needs Critical Reexamination (2024)2.35
- Reinforcement Learning Enhanced Online Adaptive Clinical Decision Support Via Digital Twin Powered Policy And Treatment Effect Optimized Reward (2025)0.00
- An Empirical Study Of Representation Learning For Reinforcement Learning In Healthcare (2020)0.00
- Statistically Efficient Advantage Learning For Offline Reinforcement Learning In Infinite Horizons (2022)0.00
- Robust Fitted-q-evaluation And Iteration Under Sequentially Exogenous Unobserved Confounders (2023)0.00