Embed To Control Partially Observed Systems: Representation Learning With Provable Sample Efficiency
2022 Β· Lingxiao Wang, Qi Cai, Zhuoran Yang, et al.
Abstract
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent the full history with a low-dimensional embedding, which assembles the per-step feature. We integrate
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning From Partial Observation: Linear Function Approximation With Provable Sample Efficiency (2022)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Finite-state Controllers For (hidden-model) Pomdps Using Deep Reinforcement Learning (2026)0.00
- Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning (2024)2.26
- GEC: A Unified Framework For Interactive Decision Making In MDP, POMDP, And Beyond (2022)0.00