Using A Logarithmic Mapping To Enable Lower Discount Factors In Reinforcement Learning
2019 Β· Harm van Seijen, Mehdi Fatemi, Arash Tavakoli
Abstract
In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
Authors
(none)
Tags
Stats
Related papers
- Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach (2019)8.60
- Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning (2024)0.00
- Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning (2022)0.00
- The Unintended Consequences Of Discount Regularization: Improving Regularization In Certainty Equivalence Reinforcement Learning (2023)0.00
- Provably Efficient Reinforcement Learning For Discounted Mdps With Feature Mapping (2020)0.00
- Hyperbolically-discounted Reinforcement Learning On Reward-punishment Framework (2021)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Correcting Discount-factor Mismatch In On-policy Policy Gradient Methods (2023)0.00