Reinforcement Learning In Pomdps With Memoryless Options And Option-observation Initiation Sets
2017 Β· Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, et al.
Abstract
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.
Authors
(none)
Tags
Stats
Related papers
- SOAP-RL: Sequential Option Advantage Propagation For Reinforcement Learning In POMDP Environments (2024)0.00
- How Memory Architecture Affects Learning In A Simple POMDP: The Two-hypothesis Testing Problem (2021)0.00
- Robust Reinforcement Learning In Pomdps With Incomplete And Noisy Observations (2019)0.00
- Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning (2024)2.26
- Deep Hierarchical Reinforcement Learning Algorithm In Partially Observable Markov Decision Processes (2018)12.87
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00
- Finite-state Controllers For (hidden-model) Pomdps Using Deep Reinforcement Learning (2026)0.00