RACCER: Towards Reachable And Certain Counterfactual Explanations For Reinforcement Learning
2023 Β· Jasmina Gajcin, Ivana Dusparic
Abstract
While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals ba
Authors
(none)
Tags
Stats
Related papers
- Redefining Counterfactual Explanations For Reinforcement Learning: Overview, Challenges And Opportunities (2022)0.00
- ACTER: Diverse And Actionable Counterfactual Sequences For Explaining And Diagnosing RL Policies (2024)0.00
- Ganterfactual-rl: Understanding Reinforcement Learning Agents' Strategies Through Visual Counterfactual Explanations (2023)2.26
- Counterfactual State Explanations For Reinforcement Learning Agents Via Generative Deep Learning (2021)13.23
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Learning Impartial Policies For Sequential Counterfactual Explanations Using Deep Reinforcement Learning (2023)0.00
- Reccover: Detecting Causal Confusion For Explainable Reinforcement Learning (2022)0.00
- Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes (2023)5.84