A Spectral Approach To Off-policy Evaluation For Pomdps
2021 Β· Yash Nair, Nan Jiang
Abstract
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)). Prior work on this problem uses a causal identification strategy based on one-step observable proxies of the hidden state, which relies on the invertibility of certain one-step moment matrices. In this work, we relax this requirement by using spectral methods and extending one-step proxies both into the past and future. We empirically compare our OPE methods to existing ones and demonstrate their improved prediction accuracy and greater generality. Lastly, we derive a separate Importance Sampling (IS) algorithm which relies on rank, distinctness, and positivity conditions, and not on the strict sufficiency conditions of observable trajectories with respect to the reward and hidden-state structure required by Tennenholtz et al. (2020a).
Authors
(none)
Tags
Stats
Related papers
- A Minimax Learning Approach To Off-policy Evaluation In Confounded Partially Observable Markov Decision Processes (2021)0.00
- Future-dependent Value-based Off-policy Evaluation In Pomdps (2022)0.00
- Statistical Tractability Of Off-policy Evaluation Of History-dependent Policies In Pomdps (2025)0.00
- Experimental Results : Reinforcement Learning Of Pomdps Using Spectral Methods (2017)0.00
- Off-policy Evaluation In Infinite-horizon Reinforcement Learning With Latent Confounders (2020)0.00
- An Instrumental Variable Approach To Confounded Off-policy Evaluation (2022)0.00
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Intrinsically Efficient, Stable, And Bounded Off-policy Evaluation For Reinforcement Learning (2019)0.00