On The (in)tractability Of Reinforcement Learning For LTL Objectives
2021 Β· Cambridge Yang, Michael Littman, Michael Carbin
Abstract
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth. In this paper, we address the tractability of reinforcement learning for general LTL objectives from a theoretical perspective. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable if and only if the formula is in the most limited class in the LTL hierarchy, consisting of formulas that are decidable within a finite horizon. Practically, our result implies that it is impossible for a
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees (2023)0.00
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Directed Exploration In Reinforcement Learning From Linear Temporal Logic (2024)0.00
- A PAC Learning Algorithm For LTL And Omega-regular Objectives In Mdps (2023)3.58
- Computably Continuous Reinforcement-learning Objectives Are Pac-learnable (2023)7.81
- Regret-free Reinforcement Learning For LTL Specifications (2024)0.00
- Learning Probabilistic Temporal Logic Specifications For Stochastic Systems (2025)0.00
- Logically-constrained Reinforcement Learning (2018)0.00