Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight
2023 Β· Jiacheng Guo, Minshuo Chen, Huan Wang, et al.
Abstract
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: *multi-observation revealing POMDPs* and *distinguishable POMDPs*. Both subclasses generalize and substantially relax *revealing POMDPs* -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission dist
Authors
(none)
Tags
Stats
Related papers
- Posterior Sampling-based Online Learning For Episodic Pomdps (2023)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- Statistical Tractability Of Off-policy Evaluation Of History-dependent Policies In Pomdps (2025)0.00
- Efficient Learning Of Pomdps With Known Observation Model In Average-reward Setting (2024)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Reinforcement Learning From Partial Observation: Linear Function Approximation With Provable Sample Efficiency (2022)0.00
- Experimental Results : Reinforcement Learning Of Pomdps Using Spectral Methods (2017)0.00
- Robust Reinforcement Learning In Pomdps With Incomplete And Noisy Observations (2019)0.00