Posterior Sampling-based Online Learning For Episodic Pomdps
2023 Β· Dengwang Tang, Dongze Ye, Rahul Jain, et al.
Abstract
Learning in POMDPs is known to be significantly harder than in MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes and is polynomial in the other parameters. In a general setting, the regret scales exponentially in the horizon length \(H\), and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret.
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Optimistic Posterior Sampling For Reinforcement Learning With Few Samples And Tight Guarantees (2022)0.00
- Efficient Learning Of Pomdps With Known Observation Model In Average-reward Setting (2024)0.00
- Experimental Results : Reinforcement Learning Of Pomdps Using Spectral Methods (2017)0.00
- Sample Efficient Reinforcement Learning With Partial Dynamics Knowledge (2023)3.58
- Dynamic Regret Of Online Markov Decision Processes (2022)0.00