Reinforcement Learning From Partial Observation: Linear Function Approximation With Provable Sample Efficiency
2022 Β· Qi Cai, Zhuoran Yang, Zhaoran Wang
Abstract
We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an \(\epsilon\)-optimal policy within \(O(1/\epsilon^2)\) episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which featu
Authors
(none)
Tags
Stats
Related papers
- Embed To Control Partially Observed Systems: Representation Learning With Provable Sample Efficiency (2022)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Computationally Efficient PAC RL In Pomdps With Latent Determinism And Conditional Embeddings (2022)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00