An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes
2021 Β· Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel
Abstract
In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a ris
Authors
(none)
Tags
Stats
Related papers
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Bayes Adaptive Monte Carlo Tree Search For Offline Model-based Reinforcement Learning (2024)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58
- Robust Batch Policy Learning In Markov Decision Processes (2020)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Offline Bayesian Aleatoric And Epistemic Uncertainty Quantification And Posterior Value Optimisation In Finite-state Mdps (2024)0.95