Statistical Tractability Of Off-policy Evaluation Of History-dependent Policies In Pomdps
2025 Β· Yuheng Zhang, Nan Jiang
Abstract
We investigate off-policy evaluation (OPE), a central and fundamental problem in reinforcement learning (RL), in the challenging setting of Partially Observable Markov Decision Processes (POMDPs) with large observation spaces. Recent works of Uehara et al. (2023a); Zhang & Jiang (2024) developed a model-free framework and identified important coverage assumptions (called belief and outcome coverage) that enable accurate OPE of memoryless policies with polynomial sample complexities, but handling more general target policies that depend on the entire observable history remained an open problem. In this work, we prove information-theoretic hardness for model-free OPE of history-dependent policies in several settings, characterized by additional assumptions imposed on the behavior policy (memoryless vs. history-dependent) and/or the state-revealing property of the POMDP (single-step vs. multi-step revealing). We further show that some hardness can be circumvented by a natural model-based
Authors
(none)
Tags
Stats
Related papers
- A Spectral Approach To Off-policy Evaluation For Pomdps (2021)0.00
- Future-dependent Value-based Off-policy Evaluation In Pomdps (2022)0.00
- A Minimax Learning Approach To Off-policy Evaluation In Confounded Partially Observable Markov Decision Processes (2021)0.00
- Double Reinforcement Learning For Efficient Off-policy Evaluation In Markov Decision Processes (2019)0.00
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)0.00
- Off-policy Evaluation In Infinite-horizon Reinforcement Learning With Latent Confounders (2020)0.00
- Conformal Off-policy Evaluation In Markov Decision Processes (2023)7.16