Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information
2023 Β· Ming Shi, Yingbin Liang, Ness B. Shroff
Abstract
Partially observable Markov decision processes (POMDPs) are a general framework for sequential decision-making under latent state uncertainty, yet learning in POMDPs is intractable in the worst case. Motivated by sensing and probing constraints in practice, we study how much online state information (OSI) is sufficient to enable efficient learning guarantees. We formalize a model in which the learner can query only partial OSI (POSI) during interaction. We first prove an information-theoretic hardness result showing that, for general POMDPs, achieving an \(\epsilon\)-optimal policy can require sample complexity that is exponential unless full OSI is available. We then identify two structured subclasses that remain learnable under POSI and propose corresponding algorithms with provably efficient performance guarantees. In particular, we establish regret upper bounds with \(\tilde\{O\}(\sqrt\{K\})\) dependence on the number of episodes \(K\), together with complementary lower bounds, the
Authors
(none)
Tags
Stats
Related papers
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Reinforcement Learning From Partial Observation: Linear Function Approximation With Provable Sample Efficiency (2022)0.00
- Posterior Sampling-based Online Learning For Episodic Pomdps (2023)0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)0.00
- Sequential Monte Carlo For Policy Optimization In Continuous Pomdps (2025)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- Robust Reinforcement Learning In Pomdps With Incomplete And Noisy Observations (2019)0.00