Perception-prediction-reaction Agents For Deep Reinforcement Learning
2020 Β· Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, et al.
Abstract
We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry. The *reaction* core incorporates new observations with input from the slow core to produce the agent's policy; the *perception* core accesses only short-term observations and informs the slow core; lastly, the *prediction* core accesses only long-term memory. An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory. We present the resulting *Perception-Prediction-Reaction* (PPR) agent and demonstrate its improved performance over a strong LSTM-agent baseline in DM
Authors
(none)
Tags
Stats
Related papers
- Dynamic Deep-reinforcement-learning Algorithm In Partially Observable Markov Decision Processes (2023)0.00
- On Improving Deep Reinforcement Learning For Pomdps (2017)0.00
- Real-time Recurrent Reinforcement Learning (2023)2.26
- The Reactor: A Fast And Sample-efficient Actor-critic Agent For Reinforcement Learning (2017)0.00
- Low-pass Recurrent Neural Networks - A Memory Architecture For Longer-term Correlation Discovery (2018)0.00
- Stable Hadamard Memory: Revitalizing Memory-augmented Agents For Reinforcement Learning (2024)0.00
- Data-efficient Reinforcement Learning With Self-predictive Representations (2020)0.00
- Influence-aware Memory Architectures For Deep Reinforcement Learning (2019)2.26