Abstract

Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain *privileged information*, e.g., the access to states from simulators, has been exploited in training and has achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting. Specifically, we first formalize the empirical paradigm of *expert distillation* (also known as *teacher-student* learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the *deterministic filter condition*, under which expert distillation achieves sample and computational complexities that are *both* polynomial. Furthermore, we investigate another useful empirical paradigm of *asymmetric actor-critic*, and focus on the more challenging setting of observable partially obs

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations1
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score2.26
  • arxiv keycai2024provable

Related papers