Learning Impartial Policies For Sequential Counterfactual Explanations Using Deep Reinforcement Learning
2023 Β· E. Panagiotou, E. Ntoutsi
Abstract
In the field of explainable Artificial Intelligence (XAI), sequential counterfactual (SCF) examples are often used to alter the decision of a trained classifier by implementing a sequence of modifications to the input instance. Although certain test-time algorithms aim to optimize for each new instance individually, recently Reinforcement Learning (RL) methods have been proposed that seek to learn policies for discovering SCFs, thereby enhancing scalability. As is typical in RL, the formulation of the RL problem, including the specification of state space, actions, and rewards, can often be ambiguous. In this work, we identify shortcomings in existing methods that can result in policies with undesired properties, such as a bias towards specific actions. We propose to use the output probabilities of the classifier to create a more informative reward, to mitigate this effect.
Authors
(none)
Tags
Stats
Related papers
- ACTER: Diverse And Actionable Counterfactual Sequences For Explaining And Diagnosing RL Policies (2024)0.00
- Redefining Counterfactual Explanations For Reinforcement Learning: Overview, Challenges And Opportunities (2022)0.00
- Experiential Explanations For Reinforcement Learning (2022)2.26
- RACCER: Towards Reachable And Certain Counterfactual Explanations For Reinforcement Learning (2023)0.00
- Sample-efficient Reinforcement Learning Via Counterfactual-based Data Augmentation (2020)0.00
- Ganterfactual-rl: Understanding Reinforcement Learning Agents' Strategies Through Visual Counterfactual Explanations (2023)2.26
- Counterfactually Fair Reinforcement Learning Via Sequential Data Preprocessing (2025)0.00
- Counterfactual State Explanations For Reinforcement Learning Agents Via Generative Deep Learning (2021)13.23