TGPO: Temporal Grounded Policy Optimization For Signal Temporal Logic Tasks
2025 Β· Yue Meng, Fei Chen, Chuchu Fan
Abstract
Learning control policies for complex, long-horizon tasks is a central challenge in robotics and autonomous systems. Signal Temporal Logic (STL) offers a powerful and expressive language for specifying such tasks, but its non-Markovian nature and inherent sparse reward make it difficult to be solved via standard Reinforcement Learning (RL) algorithms. Prior RL approaches focus only on limited STL fragments or use STL robustness scores as sparse terminal rewards. In this paper, we propose TGPO, Temporal Grounded Policy Optimization, to solve general STL tasks. TGPO decomposes STL into timed subgoals and invariant constraints and provides a hierarchical framework to tackle the problem. The high-level component of TGPO proposes concrete time allocations for these subgoals, and the low-level time-conditioned policy learns to achieve the sequenced subgoals using a dense, stage-wise reward signal. During inference, we sample various time allocations and select the most promising assignment f
Authors
(none)
Tags
Stats
Related papers
- Funnel-based Reward Shaping For Signal Temporal Logic Tasks In Reinforcement Learning (2022)7.16
- A Hierarchical Reinforcement Learning Method For Persistent Time-sensitive Tasks (2016)0.00
- TIC-GRPO: Provable And Efficient Optimization For Reinforcement Learning From Human Feedback (2025)0.00
- Phgpo: Pheromone-guided Policy Optimization For Long-horizon Tool Planning (2026)0.00
- Stlgame: Signal Temporal Logic Games In Adversarial Multi-agent Systems (2024)0.00
- Sample-efficient Reinforcement Learning With Temporal Logic Objectives: Leveraging The Task Specification To Guide Exploration (2024)0.00
- Think Outside The Policy: In-context Steered Policy Optimization (2025)0.00
- A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks (2017)11.58