Recurrent Networks, Hidden States And Beliefs In Partially Observable Environments
2022 Β· Gaspard Lambrechts, Adrien Bolland, Damien Ernst
Abstract
Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This correlation is measured through their mutual information. In addition, we show that the expected return of an agent increases with the ability of its recurrent architecture to reach a high mutual information betwee
Authors
(none)
Tags
Stats
Related papers
- Belief States For Cooperative Multi-agent Reinforcement Learning Under Partial Observability (2025)0.00
- Dynamic Deep-reinforcement-learning Algorithm In Partially Observable Markov Decision Processes (2023)0.00
- Recurrent Predictive State Policy Networks (2018)0.00
- Approximate Information State Based Convergence Analysis Of Recurrent Q-learning (2023)0.00
- Information State Embedding In Partially Observable Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Reinforcement Learning Under Partial Observability Guided By Learned Environment Models (2022)6.34
- Unraveling The Hidden Dynamical Structure In Recurrent Neural Policies (2026)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00