Explaining Reinforcement Learning Policies Through Counterfactual Trajectories
2022 Β· Julius Frost, Olivia Watkins, Eric Weiner, et al.
Abstract
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the policy's decision making in a set of agent rollouts. However, even the most informative trajectories of training time behavior may give little insight into the agent's behavior out of distribution. In contrast, our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. We generate these trajectories by guiding the agent to more diverse unseen states and showing the agent's behavior there. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
Authors
(none)
Tags
Stats
Related papers
- Explaining RL Decisions With Trajectories (2023)0.00
- Explaining Conditions For Reinforcement Learning Behaviors From Real And Imagined Data (2020)0.00
- Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes (2023)5.84
- Experiential Explanations For Reinforcement Learning (2022)2.26
- REACT: Revealing Evolutionary Action Consequence Trajectories For Interpretable Reinforcement Learning (2024)2.26
- Bad Habits: Policy Confounding And Out-of-trajectory Generalization In RL (2023)0.00
- Explaining Learned Reward Functions With Counterfactual Trajectories (2024)0.00
- Learning Impartial Policies For Sequential Counterfactual Explanations Using Deep Reinforcement Learning (2023)0.00