Integrating Policy Summaries With Reward Decomposition For Explaining Reinforcement Learning Agents
2022 Β· Yael Septon, Tobias Huber, Elisabeth AndrΓ©, et al.
Abstract
Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found
Authors
(none)
Tags
Stats
Related papers
- Local And Global Explanations Of Agent Behavior: Integrating Strategy Summaries With Saliency Maps (2020)11.85
- Experiential Explanations For Reinforcement Learning (2022)2.26
- (when) Are Contrastive Explanations Of Reinforcement Learning Helpful? (2022)0.00
- Generating Explanations From Deep Reinforcement Learning Using Episodic Memory (2022)0.00
- Generation Of Policy-level Explanations For Reinforcement Learning (2019)11.39
- Causal State Distillation For Explainable Reinforcement Learning (2023)0.00
- REVEAL-IT: Reinforcement Learning With Visibility Of Evolving Agent Policy For Interpretability (2024)0.00
- From Explainability To Interpretability: Interpretable Policies In Reinforcement Learning Via Model Explanation (2025)0.00