Explaining Reinforcement Learning: A Counterfactual Shapley Values Approach
2024 Β· Yiwei Shi, Qi Zhang, Kevin McAreavey, et al.
Abstract
This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies the differences across various decisions.
Authors
(none)
Tags
Stats
Related papers
- Explaining Reinforcement Learning With Shapley Values (2023)0.00
- A Theoretical Framework For Explaining Reinforcement Learning With Shapley Values (2025)0.00
- Collective Explainable AI: Explaining Cooperative Strategies And Agent Contribution In Multiagent Reinforcement Learning With Shapley Values (2021)0.00
- From Explainability To Interpretability: Interpretable Policies In Reinforcement Learning Via Model Explanation (2025)0.00
- Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes (2023)5.84
- SHAQ: Incorporating Shapley Value Theory Into Multi-agent Q-learning (2021)0.00
- The Shapley Value In Machine Learning (2022)17.35
- Redefining Counterfactual Explanations For Reinforcement Learning: Overview, Challenges And Opportunities (2022)0.00