Computationally Efficient PAC RL In Pomdps With Latent Determinism And Conditional Embeddings
2022 Β· Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, et al.
Abstract
We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action \(Q\)-function is linear in the state feature, and the optimal \(Q\)-function has a gap in actions, we provide a *computationally and statistically efficient* algorithm for finding the *exact optimal* policy. We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space. Furthermore, we show both the deterministic latent transitions and
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning From Partial Observation: Linear Function Approximation With Provable Sample Efficiency (2022)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Robust Reinforcement Learning In Pomdps With Incomplete And Noisy Observations (2019)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Experimental Results : Reinforcement Learning Of Pomdps Using Spectral Methods (2017)0.00
- Finite-time Analysis Of Natural Actor-critic For Pomdps (2022)0.00
- Finite-state Controllers For (hidden-model) Pomdps Using Deep Reinforcement Learning (2026)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00