Designing Rewards For Fast Learning
2022 Β· Henry Sowerby, Zhiyuan Zhou, Michael L. Littman
Abstract
To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward function for the environment, arguably the most important knob designers have in interacting with RL agents. Although many reward functions induce the same optimal behavior (Ng et al., 1999), in practice, some of them result in faster learning than others. In this paper, we look at how reward-design choices impact learning speed and seek to identify principles of good reward design that quickly induce target behavior. This reward-identification problem is framed as an optimization problem: Firstly, we advocate choosing state-based rewards that maximize the action gap, making optimal actions easy to distinguish from suboptimal ones. Secondly, we propose minimizing a measure of the horizon, something we call the "subjective discount", over which rewards need to be optimized to encourage agents to make optimal decisions with less lookahead. To solve this optimization problem, we propose a li
Authors
(none)
Tags
Stats
Related papers
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Tiered Reward: Designing Rewards For Specification And Fast Learning Of Desired Behavior (2022)0.00
- Informativeness Of Reward Functions In Reinforcement Learning (2024)2.26
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Pitfalls Of Learning A Reward Function Online (2020)4.52
- Provably Feedback-efficient Reinforcement Learning Via Active Reward Learning (2023)0.00
- On Learning Intrinsic Rewards For Policy Gradient Methods (2018)0.00
- Reward Models In Deep Reinforcement Learning: A Survey (2025)0.00