Bayesian Learning Of The Optimal Action-value Function In A Markov Decision Process
2025 Β· Jiaqi Guo, Chon Wai Ho, Sumeetpal S. Singh
Abstract
The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian framework is used to maintain beliefs about the optimal decisions and the unknown ingredients of the model, which are also to be learned from the data, such as the rewards and state dynamics. However, many existing Bayesian approaches for learning the optimal decision-making strategy are based on unrealistic modelling assumptions and utilise approximate inference techniques. This raises doubts whether the benefits of Bayesian uncertainty quantification are fully realised or can be relied upon. We focus on infinite-horizon and undiscounted MDPs, with finite state and action spaces, and a terminal state. We provide a full Bayesian framework, from modelling to inference to decision-making. For modelling, we introduce a likelihood function with minimal assumpt
Authors
(none)
Tags
Stats
Related papers
- Bayesian Learning Of Optimal Policies In Markov Decision Processes With Countably Infinite State-space (2023)0.00
- Offline Bayesian Aleatoric And Epistemic Uncertainty Quantification And Posterior Value Optimisation In Finite-state Mdps (2024)0.95
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- \(\sqrt{n}\)-regret For Learning In Markov Decision Processes With Function Approximation And Low Bellman Rank (2019)0.00
- Planning And Learning In Average Risk-aware Mdps (2025)0.00
- Model-free Reinforcement Learning For Branching Markov Decision Processes (2021)0.00