Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees
2023 Β· Daqian Shao, Marta Kwiatkowska
Abstract
Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Directed Exploration In Reinforcement Learning From Linear Temporal Logic (2024)0.00
- Learning Probabilistic Temporal Logic Specifications For Stochastic Systems (2025)0.00
- Logically-constrained Reinforcement Learning (2018)0.00
- Regret-free Reinforcement Learning For LTL Specifications (2024)0.00
- On The (in)tractability Of Reinforcement Learning For LTL Objectives (2021)0.00
- Logical Specifications-guided Dynamic Task Sampling For Reinforcement Learning Agents (2024)2.26
- A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks (2017)11.58