Bootstrapping With Models: Confidence Intervals For Off-policy Evaluation
2016 Β· Josiah P. Hanna, Peter Stone, Scott Niekum
Abstract
For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is est
Authors
(none)
Tags
Stats
Related papers
- Statistical Bootstrapping For Uncertainty Estimation In Off-policy Evaluation (2020)0.00
- Bootstrapping Fitted Q-evaluation For Off-policy Inference (2021)0.00
- Non-asymptotic Confidence Intervals Of Off-policy Evaluation: Primal And Dual Bounds (2021)0.00
- Combining Parametric And Nonparametric Models For Off-policy Evaluation (2019)0.00
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00
- Interpretable Off-policy Evaluation In Reinforcement Learning By Highlighting Influential Transitions (2020)0.00
- High-confidence Error Estimates For Learned Value Functions (2018)0.00