Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes
2023 Β· Yotam Amitai, Yael Septon, Ofra Amir
Abstract
Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose ``COViz'', a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for dif
Authors
(none)
Tags
Stats
Related papers
- Ganterfactual-rl: Understanding Reinforcement Learning Agents' Strategies Through Visual Counterfactual Explanations (2023)2.26
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Why The Agent Made That Decision: Contrastive Explanation Learning For Reinforcement Learning (2024)0.00
- What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes (2020)0.00
- Redefining Counterfactual Explanations For Reinforcement Learning: Overview, Challenges And Opportunities (2022)0.00
- Talktoagent: A Human-centric Explanation Of Reinforcement Learning Agents With Large Language Models (2025)0.00
- Explainable Reinforcement Learning Agents Using World Models (2025)0.00
- A Survey Of Explainable Reinforcement Learning (2022)0.00