The Unintended Consequences Of Discount Regularization: Improving Regularization In Certainty Equivalence Reinforcement Learning
2023 Β· Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, et al.
Abstract
Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization pa
Authors
(none)
Tags
Stats
Related papers
- Regularization Matters In Policy Optimization (2019)2.68
- Entropy Regularization With Discounted Future State Distribution In Policy Gradient Methods (2019)0.00
- Comparison And Unification Of Three Regularization Methods In Batch Reinforcement Learning (2021)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- A Kl-regularization Framework For Learning To Plan With Adaptive Priors (2025)0.00
- Using A Logarithmic Mapping To Enable Lower Discount Factors In Reinforcement Learning (2019)0.00
- Temporal Regularization In Markov Decision Process (2018)0.00
- Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning (2022)0.00