Real-time Recurrent Reinforcement Learning
2023 Β· Julian Lemmel, Radu Grosu
Abstract
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
Authors
(none)
Tags
Stats
Related papers
- Dynamic Deep-reinforcement-learning Algorithm In Partially Observable Markov Decision Processes (2023)0.00
- Deep Hierarchical Reinforcement Learning Algorithm In Partially Observable Markov Decision Processes (2018)12.87
- Variational Recurrent Models For Solving Partially Observable Control Tasks (2019)0.00
- On Improving Deep Reinforcement Learning For Pomdps (2017)0.00
- Recurrent Natural Policy Gradient For Pomdps (2024)0.00
- Perception-prediction-reaction Agents For Deep Reinforcement Learning (2020)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Efficient Deep Reinforcement Learning With Predictive Processing Proximal Policy Optimization (2022)0.00