From Explainability To Interpretability: Interpretable Policies In Reinforcement Learning Via Model Explanation
2025 Β· Peilang Li, Umer Siddique, Yongcan Cao
Abstract
Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver a global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation beyond local explanations and a general framework applicable to off-policy and on-policy algorithms. We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments. The re
Authors
(none)
Tags
Stats
Related papers
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- Collective Explainable AI: Explaining Cooperative Strategies And Agent Contribution In Multiagent Reinforcement Learning With Shapley Values (2021)0.00
- Explainability In Deep Reinforcement Learning, A Review Into Current Methods And Applications (2022)12.33
- A Survey On Interpretable Reinforcement Learning (2021)0.00
- Explainability In Deep Reinforcement Learning (2020)0.00
- A Theoretical Framework For Explaining Reinforcement Learning With Shapley Values (2025)0.00
- Explainable Reinforcement Learning Via Model Transforms (2022)0.00
- Generation Of Policy-level Explanations For Reinforcement Learning (2019)11.39