How Do You Act? An Empirical Study To Understand Behavior Of Deep Reinforcement Learning Agents
2020 Β· Richard Meyes, Moritz Schneider, Tobias Meisen
Abstract
The demand for more transparency of decision-making processes of deep reinforcement learning agents is greater than ever, due to their increased use in safety critical and ethically challenging domains such as autonomous driving. In this empirical study, we address this lack of transparency following an idea that is inspired by research in the field of neuroscience. We characterize the learned representations of an agent's policy network through its activation space and perform partial network ablations to compare the representations of the healthy and the intentionally damaged networks. We show that the healthy agent's behavior is characterized by a distinct correlation pattern between the network's layer activation and the performed actions during an episode and that network ablations, which cause a strong change of this pattern, lead to the agent failing its trained control task. Furthermore, the learned representation of the healthy agent is characterized by a distinct pattern in i
Authors
(none)
Tags
Stats
Related papers
- Agent Modelling Under Partial Observability For Deep Reinforcement Learning (2020)0.00
- REVEAL-IT: Reinforcement Learning With Visibility Of Evolving Agent Policy For Interpretability (2024)0.00
- Studying The Interplay Between The Actor And Critic Representations In Reinforcement Learning (2025)0.00
- Towards Governing Agent's Efficacy: Action-conditional \(\beta\)-vae For Deep Transparent Reinforcement Learning (2018)0.00
- Investigating The Properties Of Neural Network Representations In Reinforcement Learning (2022)0.00
- Analysing Factorizations Of Action-value Networks For Cooperative Multi-agent Reinforcement Learning (2019)2.26
- Interpretable Learning Dynamics In Unsupervised Reinforcement Learning (2025)0.00
- Can You See How I Learn? Human Observers' Inferences About Reinforcement Learning Agents' Learning Processes (2025)0.00