Average Reward Reinforcement Learning For Omega-regular And Mean-payoff Objectives

Abstract

Recent advances in reinforcement learning (RL) have renewed interest in reward design for shaping agent behavior, but manually crafting reward functions is tedious and error-prone. A principled alternative is to specify behavioral requirements in a formal, unambiguous language and automatically compile them into learning objectives. \(\omega\)-regular languages are a natural fit, given their role in formal verification and synthesis. However, most existing \(\omega\)-regular RL approaches operate in an episodic, discounted setting with periodic resets, which is misaligned with \(\omega\)-regular semantics over infinite traces. For continuing tasks, where the agent interacts with the environment over a single uninterrupted lifetime, the average-reward criterion is more appropriate. We focus on absolute liveness specifications, a subclass of \(\omega\)-regular languages that cannot be violated by any finite prefix and thus aligns naturally with continuing interaction. We present the fi

Average Reward Reinforcement Learning For Omega-regular And Mean-payoff Objectives

Abstract

Authors

Tags

Stats

Related papers