Recurrent Predictive State Policy Networks
2018 Β· Ahmed Hefny, Zita Marinho, Wen Sun, et al.
Abstract
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state-- a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying traini
Authors
(none)
Tags
Stats
Related papers
- Recurrent Networks, Hidden States And Beliefs In Partially Observable Environments (2022)0.00
- Data-efficient Reinforcement Learning With Self-predictive Representations (2020)0.00
- Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning (2024)2.26
- Dynamic Deep-reinforcement-learning Algorithm In Partially Observable Markov Decision Processes (2023)0.00
- Unraveling The Hidden Dynamical Structure In Recurrent Neural Policies (2026)0.00
- Policy Prediction Network: Model-free Behavior Policy With Model-based Learning In Continuous Action Space (2019)0.00
- Perception-prediction-reaction Agents For Deep Reinforcement Learning (2020)0.00
- Efficient Deep Reinforcement Learning With Predictive Processing Proximal Policy Optimization (2022)0.00