Temporal-logic-based Reward Shaping For Continuing Reinforcement Learning Tasks
2020 Β· Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, et al.
Abstract
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function
Authors
(none)
Tags
Stats
Related papers
- Funnel-based Reward Shaping For Signal Temporal Logic Tasks In Reinforcement Learning (2022)7.16
- Adaptive Reward Design For Reinforcement Learning (2024)0.00
- A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks (2017)11.58
- Directed Exploration In Reinforcement Learning From Linear Temporal Logic (2024)0.00
- Subgoal-based Reward Shaping To Improve Efficiency In Reinforcement Learning (2021)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Shaping Advice In Deep Reinforcement Learning (2022)0.00