Logically-constrained Reinforcement Learning
2018 Β· Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening
Abstract
We present the first model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied. The given temporal property is converted into a Limit Deterministic Buchi Automaton (LDBA) and a robust reward function is defined over the state-action pairs of the MDP according to the resulting LDBA. With this reward function, the policy synthesis procedure is "constrained" by the given specification. These constraints guide the MDP exploration so as to minimize the solution time by only considering the portion of the MDP that is relevant to satisfaction of the LTL property. This improves performance and scalability of the proposed method by avoiding an exhaustive update over the whole state space while the efficiency of standard methods such as dynamic programming is hindered by excessive memory requirements, caused by the need to store a full-model in memory. Additionally, we show that the RL proce
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees (2023)0.00
- Regret-free Reinforcement Learning For LTL Specifications (2024)0.00
- Directed Exploration In Reinforcement Learning From Linear Temporal Logic (2024)0.00
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Model-free \(\mu\) Synthesis Via Adversarial Reinforcement Learning (2021)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24