Lazy-mdps: Towards Interpretable Reinforcement Learning By Learning When To Act
2022 Β· Alexis Jacq, Johan Ferret, Olivier Pietquin, et al.
Abstract
Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process and make a new mode of action available: being lazy, which defers decision-making to a default policy. In addition, we penalize non-lazy actions in order to encourage minimal effort and have agents focus on critical decisions only. We name the resulting formalism lazy-MDPs. We study the theoretical properties of lazy-MDPs, expressing value functions and characterizing optimal solutions. Then we empirically demonstrate that policies learned in lazy-MDPs generally come with a form of interpretability: by construction, they show us the states where the agent takes control over the default policy. W
Authors
(none)
Tags
Stats
Related papers
- The Virtues Of Laziness In Model-based RL: A Unified Objective And Algorithms (2023)0.00
- Learn A Flexible Exploration Model For Parameterized Action Markov Decision Processes (2025)0.00
- Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees (2023)0.00
- Interpretable Learning Dynamics In Unsupervised Reinforcement Learning (2025)0.00
- Reinforcement Learning With Sparse-executing Actions Via Sparsity Regularization (2021)0.00
- Active Inference And Reinforcement Learning: A Unified Inference On Continuous State And Action Spaces Under Partial Observability (2022)5.84
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Learning Interpretable Policies In Hindsight-observable Pomdps Through Partially Supervised Reinforcement Learning (2024)2.26