BXRL: Behavior-explainable Reinforcement Learning
2026 Β· Ram Rachum, Yotam Amitai, Yonatan Nakar, et al.
Abstract
A major challenge of Reinforcement Learning is that agents often learn undesired behaviors that seem to defy the reward structure they were given. Explainable Reinforcement Learning (XRL) methods can answer queries such as "explain this specific action", "explain this specific trajectory", and "explain the entire policy". However, XRL lacks a formal definition for behavior as a pattern of actions across many episodes. We provide such a definition, and use it to enable a new query: "Explain this behavior". We present Behavior-Explainable Reinforcement Learning (BXRL), a new problem formulation that treats behaviors as first-class objects. BXRL defines a behavior measure as any function \(m : \Pi \to \mathbb\{R\}\), allowing users to precisely express the pattern of actions that they find interesting and measure how strongly the policy exhibits it. We define contrastive behaviors that reduce the question "why does the agent prefer \(a\) to \(a'\)?" to "why is \(m(\pi)\) high?" which ca
Authors
(none)
Tags
Stats
Related papers
- A Survey Of Explainable Reinforcement Learning (2022)0.00
- Xrl-bench: A Benchmark For Evaluating And Comparing Explainable Reinforcement Learning Techniques (2024)0.00
- Explainable Reinforcement Learning For Broad-xai: A Conceptual Framework And Survey (2021)0.00
- A Survey On Explainable Reinforcement Learning: Concepts, Algorithms, Challenges (2022)0.00
- A Survey Of Explainable Reinforcement Learning: Targets, Methods And Needs (2025)0.00
- Explainable Reinforcement Learning: A Survey (2020)0.00
- Interestingness Elements For Explainable Reinforcement Learning: Understanding Agents' Capabilities And Limitations (2019)14.55
- Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes (2023)5.84