Rethinking The Discount Factor In Reinforcement Learning: A Decision Theoretic Approach
2019 Β· Silviu Pitis
Abstract
Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor \(\gamma < 1\), or in episodic settings, with \(\gamma = 1\). While this has proven effective for specific tasks with well-defined objectives (e.g., games), it has never been established that fixed discounting is suitable for general purpose use (e.g., as a model of human preferences). This paper characterizes rationality in sequential decision making using a set of seven axioms and arrives at a form of discounting that generalizes traditional fixed discounting. In particular, our framework admits a state-action dependent "discount" factor that is not constrained to be less than 1, so long as there is eventual long run discounting. Although this broadens the range of possible preference structures in continuous settings, we show that there exists a unique "optimizing MDP" with fixed \(\gamma <
Authors
(none)
Tags
Stats
Related papers
- Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning (2022)0.00
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Using A Logarithmic Mapping To Enable Lower Discount Factors In Reinforcement Learning (2019)0.00
- Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning (2024)0.00
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00
- Reward Tweaking: Maximizing The Total Reward While Planning For Short Horizons (2020)0.00
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24