Reward Tweaking: Maximizing The Total Reward While Planning For Short Horizons
2020 Β· Chen Tessler, Shie Mannor
Abstract
In reinforcement learning, the discount factor \(\gamma\) controls the agent's effective planning horizon. Traditionally, this parameter was considered part of the MDP; however, as deep reinforcement learning algorithms tend to become unstable when the effective planning horizon is long, recent works refer to \(\gamma\) as a hyper-parameter -- thus changing the underlying MDP and potentially leading the agent towards sub-optimal behavior on the original task. In this work, we introduce *reward tweaking*. Reward tweaking learns a surrogate reward function \(\tilde r\) for the discounted setting that induces optimal behavior on the original finite-horizon total reward task. Theoretically, we show that there exists a surrogate reward that leads to optimality in the original task and discuss the robustness of our approach. Additionally, we perform experiments in high-dimensional continuous control tasks and show that reward tweaking guides the agent towards better long-horizon returns alth
Authors
(none)
Tags
Stats
Related papers
- Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning (2024)0.00
- Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach (2019)8.60
- Hyperbolically-discounted Reinforcement Learning On Reward-punishment Framework (2021)0.00
- Goodhart's Law In Reinforcement Learning (2023)0.00
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Advantage Shaping As Surrogate Reward Maximization: Unifying Pass@k Policy Gradients (2025)0.00
- Designing Rewards For Fast Learning (2022)0.00