Provable Partially Observable Reinforcement Learning With Privileged Information
2024 Β· Yang Cai, Xiangyu Liu, Argyris Oikonomou, et al.
Abstract
Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain *privileged information*, e.g., the access to states from simulators, has been exploited in training and has achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting. Specifically, we first formalize the empirical paradigm of *expert distillation* (also known as *teacher-student* learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the *deterministic filter condition*, under which expert distillation achieves sample and computational complexities that are *both* polynomial. Furthermore, we investigate another useful empirical paradigm of *asymmetric actor-critic*, and focus on the more challenging setting of observable partially obs
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning Under Partial Observability Guided By Learned Environment Models (2022)6.34
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Task-guided Inverse Reinforcement Learning Under Partial Information (2021)0.00
- Unbiased Asymmetric Reinforcement Learning Under Partial Observability (2021)2.26
- Informed Asymmetric Actor-critic: Leveraging Privileged Signals Beyond Full-state Access (2025)0.00
- Partially Observable RL With B-stability: Unified Structural Condition And Sharp Sample-efficient Algorithms (2022)0.00
- Guided Policy Optimization Under Partial Observability (2025)0.00