Bayesian Learning Of Optimal Policies In Markov Decision Processes With Countably Infinite State-space
2023 Β· Saghar Adler, Vijay Subramanian
Abstract
Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter \(\theta\in\Theta\), and defined on a countably-infinite state space \(\mathcal X=\mathbb\{Z\}_+^d\), with finite action space \(\mathcal A\), and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter \(\boldsymbol\{\theta\}^*\) generated via a given fixed prior distribution on \(\Theta\). To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode,
Authors
(none)
Tags
Stats
Related papers
- Bayesian Learning Of The Optimal Action-value Function In A Markov Decision Process (2025)0.00
- Offline Bayesian Aleatoric And Epistemic Uncertainty Quantification And Posterior Value Optimisation In Finite-state Mdps (2024)0.95
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- A Policy Gradient Approach For Finite Horizon Constrained Markov Decision Processes (2022)3.58
- Bayesian Risk-sensitive Policy Optimization For Mdps With General Loss Functions (2025)0.00
- Adaptive Sampling For Best Policy Identification In Markov Decision Processes (2020)0.00
- An Optimal Policy For Learning Controllable Dynamics By Exploration (2025)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60