Synthesising Reinforcement Learning Policies Through Set-valued Inductive Rule Learning
2021 Β· Youri Coppens, Denis Steckelmacher, Catholijn M. Jonker, et al.
Abstract
Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to actions, but also produces extra meta-information, such as action values indicating the quality of alternative actions. This meta-information can indicate whether more than one action is near-optimal for a certain state. We extend CN2 to make it able to leverage knowledge about equally-good actions to distill the policy into fewer rules, increasing its interpretability by a person. Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment. We demonstrate the
Authors
(none)
Tags
Stats
Related papers
- "so, Tell Me About Your Policy...": Distillation Of Interpretable Policies From Deep Reinforcement Learning Agents (2025)0.00
- Theoretically Guaranteed Policy Improvement Distilled From Model-based Planning (2023)2.26
- Reward-conditioned Policies (2019)0.00
- Evaluating Interpretable Reinforcement Learning By Distilling Policies Into Programs (2025)0.00
- Continuous Action Reinforcement Learning From A Mixture Of Interpretable Experts (2020)0.00
- S-REINFORCE: A Neuro-symbolic Policy Gradient Approach For Interpretable Reinforcement Learning (2023)0.00
- Verifiable Reinforcement Learning Via Policy Extraction (2018)0.00
- Ranking Policy Decisions (2020)0.00