A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
2017 Β· Xiao Li, Yao Ma, Calin Belta
Abstract
Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.
Authors
(none)
Tags
Stats
Related papers
- Directed Exploration In Reinforcement Learning From Linear Temporal Logic (2024)0.00
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Funnel-based Reward Shaping For Signal Temporal Logic Tasks In Reinforcement Learning (2022)7.16
- Adaptive Reward Design For Reinforcement Learning (2024)0.00
- Logical Specifications-guided Dynamic Task Sampling For Reinforcement Learning Agents (2024)2.26
- Sample Efficient Model-free Reinforcement Learning From LTL Specifications With Optimality Guarantees (2023)0.00
- Temporal-logic-based Reward Shaping For Continuing Reinforcement Learning Tasks (2020)9.76
- Probabilistic Satisfaction Of Temporal Logic Constraints In Reinforcement Learning Via Adaptive Policy-switching (2024)0.00