Unraveling The Hidden Dynamical Structure In Recurrent Neural Policies
2026 Β· Jin Li, Yue Wu, Mengsha Huang, et al.
Abstract
Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered them unparalleled performance when compared to non-recurrent counterparts. However, until today, the underlying mechanisms for their superior generalization and robustness performance remain poorly understood. In this study, by analyzing the hidden state domain of recurrent policies learned over a diverse set of training methods, model architectures, and tasks, we find that stable cyclic structures consistently emerge during interaction with the environment. Such cyclic structures share a remarkable similarity with \textit\{limit cycles\} in dynamical system analysis, if we consider the policy and the environment as a joint hybrid dynamical system. Moreover, we uncover that the geometry of such limit cycles also has a structured correspondence with the policies' behaviors. These findings offer new per
Authors
(none)
Tags
Stats
Related papers
- Recurrent Networks, Hidden States And Beliefs In Partially Observable Environments (2022)0.00
- Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy (2022)0.00
- Addressing Action Oscillations Through Learning Policy Inertia (2021)7.81
- Dynamic Deep-reinforcement-learning Algorithm In Partially Observable Markov Decision Processes (2023)0.00
- Learning Nonlinear Causal Reductions To Explain Reinforcement Learning Policies (2025)0.00
- Recurrent Predictive State Policy Networks (2018)0.00
- Recurrent World Models Facilitate Policy Evolution (2018)0.00
- Learning Self-imitating Diverse Policies (2018)0.00