Explainable Reinforcement Learning Via Model Transforms
2022 Β· Mira Finkelstein, Lucy Liu, Nitsan Levy Schlot, et al.
Abstract
Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying model is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), the model can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MD
Authors
(none)
Tags
Stats
Related papers
- Contrastive Explanations For Reinforcement Learning In Terms Of Expected Consequences (2018)0.00
- Experiential Explanations For Reinforcement Learning (2022)2.26
- Causal State Distillation For Explainable Reinforcement Learning (2023)0.00
- From Explainability To Interpretability: Interpretable Policies In Reinforcement Learning Via Model Explanation (2025)0.00
- Explainable Reinforcement Learning Via A Causal World Model (2023)9.03
- Explainable Reinforcement Learning Agents Using World Models (2025)0.00
- Utilizing Explainability Techniques For Reinforcement Learning Model Assurance (2023)2.16
- Why The Agent Made That Decision: Contrastive Explanation Learning For Reinforcement Learning (2024)0.00