Iterative Bounding Mdps: Learning Interpretable Policies Via Non-interpretable Methods
2021 Β· Nicholay Topin, Stephanie Milani, Fei Fang, et al.
Abstract
Current work in explainable reinforcement learning generally produces policies in the form of a decision tree over the state space. Such policies can be used for formal safety verification, agent behavior prediction, and manual inspection of important features. However, existing approaches fit a decision tree after training or use a custom learning procedure which is not compatible with new learning techniques, such as those which use neural networks. To address this limitation, we propose a novel Markov Decision Process (MDP) type for learning decision tree policies: Iterative Bounding MDPs (IBMDPs). An IBMDP is constructed around a base MDP so each IBMDP policy is guaranteed to correspond to a decision tree policy for the base MDP when using a method-agnostic masking procedure. Because of this decision tree equivalence, any function approximator can be used during training, including a neural network, while yielding a decision tree policy for the base MDP. We present the required mas
Authors
(none)
Tags
Stats
Related papers
- Optimizing Interpretable Decision Tree Policies For Reinforcement Learning (2024)0.00
- Generation Of Policy-level Explanations For Reinforcement Learning (2019)11.39
- Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning With Clairvoyant Experts (2020)0.00
- MAVIPER: Learning Decision Tree Policies For Interpretable Multi-agent Reinforcement Learning (2022)5.84
- On Learning History Based Policies For Controlling Markov Decision Processes (2022)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00
- Model-based Exploration In Monitored Markov Decision Processes (2025)0.00
- On-line Learning In Tree Mdps By Treating Policies As Bandit Arms (2026)0.00