ACTER: Diverse And Actionable Counterfactual Sequences For Explaining And Diagnosing RL Policies
2024 Β· Jasmina Gajcin, Ivana Dusparic
Abstract
Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic
Authors
(none)
Tags
Stats
Related papers
- Learning Impartial Policies For Sequential Counterfactual Explanations Using Deep Reinforcement Learning (2023)0.00
- RACCER: Towards Reachable And Certain Counterfactual Explanations For Reinforcement Learning (2023)0.00
- Redefining Counterfactual Explanations For Reinforcement Learning: Overview, Challenges And Opportunities (2022)0.00
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Ganterfactual-rl: Understanding Reinforcement Learning Agents' Strategies Through Visual Counterfactual Explanations (2023)2.26
- REACT: Revealing Evolutionary Action Consequence Trajectories For Interpretable Reinforcement Learning (2024)2.26
- Counterfactual State Explanations For Reinforcement Learning Agents Via Generative Deep Learning (2021)13.23
- Learning Nonlinear Causal Reductions To Explain Reinforcement Learning Policies (2025)0.00