A Framework For Understanding And Visualizing Strategies Of RL Agents
2022 Β· Pedro Sequeira, Daniel Elenius, Jesse Hostetler, et al.
Abstract
Recent years have seen significant advances in explainable AI as the need to understand deep learning models has gained importance with the increased emphasis on trust and ethics in AI. Comprehensible models for sequential decision tasks are a particular challenge as they require understanding not only individual predictions but a series of predictions that interact with environmental dynamics. We present a framework for learning comprehensible models of sequential decision tasks in which agent strategies are characterized using temporal logic formulas. Given a set of agent traces, we first cluster the traces using a novel embedding method that captures frequent action patterns. We then search for logical formulas that explain the agent strategies in the different clusters. We evaluate our framework on combat scenarios in StarCraft II (SC2), using traces from a handcrafted expert policy and a trained reinforcement learning agent. We implemented a feature extractor for SC2 environments
Authors
(none)
Tags
Stats
Related papers
- Ganterfactual-rl: Understanding Reinforcement Learning Agents' Strategies Through Visual Counterfactual Explanations (2023)2.26
- REVEAL-IT: Reinforcement Learning With Visibility Of Evolving Agent Policy For Interpretability (2024)0.00
- Unveiling The Black Box: A Multi-layer Framework For Explaining Reinforcement Learning-based Cyber Agents (2025)0.00
- Learning Impartial Policies For Sequential Counterfactual Explanations Using Deep Reinforcement Learning (2023)0.00
- Local And Global Explanations Of Agent Behavior: Integrating Strategy Summaries With Saliency Maps (2020)11.85
- Why The Agent Made That Decision: Contrastive Explanation Learning For Reinforcement Learning (2024)0.00
- Explaining Conditions For Reinforcement Learning Behaviors From Real And Imagined Data (2020)0.00
- Talktoagent: A Human-centric Explanation Of Reinforcement Learning Agents With Large Language Models (2025)0.00