Explaining Deep Reinforcement Learning Agents In The Atari Domain Through A Surrogate Model
2021 Β· Alexander Sieusahai, Matthew Guzdial
Abstract
One major barrier to applications of deep Reinforcement Learning (RL) both inside and outside of games is the lack of explainability. In this paper, we describe a lightweight and effective method to derive explanations for deep RL agents, which we evaluate in the Atari domain. Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation. We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent. Our experiments demonstrate that we can learn an effective surrogate that accurately approximates the underlying decision making of a target agent on a suite of Atari games.
Authors
(none)
Tags
Stats
Related papers
- Explainable Deep Reinforcement Learning Using Introspection In A Non-episodic Task (2021)0.00
- Visualizing And Understanding Atari Agents (2017)0.00
- Counterfactual State Explanations For Reinforcement Learning Agents Via Generative Deep Learning (2021)13.23
- Learn To Interpret Atari Agents (2018)0.00
- Model-based Reinforcement Learning For Atari (2019)0.00
- Counterfactual States For Atari Agents Via Generative Deep Learning (2019)0.00
- Exploratory Not Explanatory: Counterfactual Analysis Of Saliency Maps For Deep Reinforcement Learning (2019)0.00
- Reconstructing Actions To Explain Deep Reinforcement Learning (2020)0.00