A Survey On Interpretable Reinforcement Learning
2021 Β· Claire Glanois, Paul Weng, Matthieu Zimmer, et al.
Abstract
Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to in
Authors
(none)
Tags
Stats
Related papers
- A Survey On Explainable Reinforcement Learning: Concepts, Algorithms, Challenges (2022)0.00
- Evaluating Interpretable Reinforcement Learning By Distilling Policies Into Programs (2025)0.00
- Explainable Reinforcement Learning: A Survey (2020)0.00
- From Explainability To Interpretability: Interpretable Policies In Reinforcement Learning Via Model Explanation (2025)0.00
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- A Survey Of Explainable Reinforcement Learning (2022)0.00
- Social Interpretable Reinforcement Learning (2024)3.58
- Towards A Research Community In Interpretable Reinforcement Learning: The Interppol Workshop (2024)0.00