Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning
2024 Β· Shuyu Yin, Fei Wen, Peilin Liu, et al.
Abstract
The optimal objective is a fundamental aspect of reinforcement learning (RL), as it determines how policies are evaluated and optimized. While total return maximization is the ideal objective in RL, discounted return maximization is the practical objective due to its stability. This can lead to a misalignment of objectives. To better understand the problem, we theoretically analyze the performance gap between the policy maximizes the total return and the policy maximizes the discounted return. Our analysis reveals that increasing the discount factor can be ineffective at eliminating this gap when environment contains cyclic states,a frequent scenario. To address this issue, we propose two alternative approaches to align the objectives. The first approach achieves alignment by modifying the terminal state value, treating it as a tunable hyper-parameter with its suitable range defined through theoretical analysis. The second approach focuses on calibrating the reward data in trajectories
Authors
(none)
Tags
Stats
Related papers
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- Learning Fair Policies In Multiobjective (deep) Reinforcement Learning With Average And Discounted Rewards (2020)0.00
- Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning (2022)0.00
- Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach (2019)8.60
- Reward Models In Deep Reinforcement Learning: A Survey (2025)0.00
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Reward Tweaking: Maximizing The Total Reward While Planning For Short Horizons (2020)0.00