Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation
2020 Β· Aaron Sonabend-W, Junwei Lu, Leo A. Celi, et al.
Abstract
Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds con
Authors
(none)
Tags
Stats
Related papers
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Statistically Efficient Advantage Learning For Offline Reinforcement Learning In Infinite Horizons (2022)0.00
- Preserving Expert-level Privacy In Offline Reinforcement Learning (2024)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00