RGMDT: Return-gap-minimizing Decision Tree Extraction In Non-euclidean Metric Space
2024 Β· Jingdi Chen, Hanhan Zhou, Yongsheng Mei, et al.
Abstract
Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return
Authors
(none)
Tags
Stats
Related papers
- CDT: Cascading Decision Trees For Explainable Reinforcement Learning (2020)0.00
- Optimizing Interpretable Decision Tree Policies For Reinforcement Learning (2024)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- Understanding What Affects The Generalization Gap In Visual Reinforcement Learning: Theory And Empirical Evidence (2024)5.84
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- Iterative Bounding Mdps: Learning Interpretable Policies Via Non-interpretable Methods (2021)0.00