A PAC Learning Algorithm For LTL And Omega-regular Objectives In Mdps
2023 Β· Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
Abstract
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.
Authors
(none)
Tags
Stats
Related papers
- On The (in)tractability Of Reinforcement Learning For LTL Objectives (2021)0.00
- Computably Continuous Reinforcement-learning Objectives Are Pac-learnable (2023)7.81
- Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees (2023)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Omega-regular Decision Processes (2023)0.00
- Efficient PAC Reinforcement Learning In Regular Decision Processes (2021)2.26
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58