Scaling Marginalized Importance Sampling To High-dimensional State-spaces Via State Abstraction
2022 Β· Brahma S. Pavse, Josiah P. Hanna
Abstract
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, \(\pi_e\), using a fixed dataset, \(\mathcal\{D\}\), collected by one or more policies that may be different from \(\pi_e\). Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under \(\pi_e\) is very different from the probability of that same pair occurring in \(\mathcal\{D\}\) (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution correction ratios to produce their OPE estimate. In the original ground state-space, these ra
Authors
(none)
Tags
Stats
Related papers
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Off-policy Evaluation With Deeply-abstracted States (2024)0.00
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00
- Doubly Robust Estimator For Off-policy Evaluation With Large Action Spaces (2023)0.00
- Kernel Metric Learning For In-sample Off-policy Evaluation Of Deterministic RL Policies (2024)0.00
- Projected State-action Balancing Weights For Offline Reinforcement Learning (2021)0.00
- Double Reinforcement Learning For Efficient Off-policy Evaluation In Markov Decision Processes (2019)0.00
- A Spectral Approach To Off-policy Evaluation For Pomdps (2021)0.00