Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning
2024 Β· Michael Lanier, Ying Xu, Nathan Jacobs, et al.
Abstract
Deep reinforcement learning has demonstrated remarkable achievements across diverse domains such as video games, robotic control, autonomous driving, and drug discovery. Common methodologies in partially-observable domains largely lean on end-to-end learning from high-dimensional observations, such as images, without explicitly reasoning about true state. We suggest an alternative direction, introducing the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. The approach leverages a state estimator to distill supervised semantic state information from high-dimensional observations which are often fully observable at training time. This yields more interpretable policies that compose state predictions with control. In parallel, it captures an unsupervised latent representation. These two-the semantic state and the latent state-are then fused and utilized as inputs to a policy network. This juxtapo
Authors
(none)
Tags
Stats
Related papers
- On Improving Deep Reinforcement Learning For Pomdps (2017)0.00
- Deep Hierarchical Reinforcement Learning Algorithm In Partially Observable Markov Decision Processes (2018)12.87
- Finite-state Controllers For (hidden-model) Pomdps Using Deep Reinforcement Learning (2026)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Robust Reinforcement Learning In Pomdps With Incomplete And Noisy Observations (2019)0.00
- Embed To Control Partially Observed Systems: Representation Learning With Provable Sample Efficiency (2022)0.00
- Pid-inspired Inductive Biases For Deep Reinforcement Learning In Partially Observable Control Tasks (2023)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00