Temporal Regularization In Markov Decision Process
2018 Β· Pierre Thodoroff, Audrey Durand, Joelle Pineau, et al.
Abstract
Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensional domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.
Authors
(none)
Tags
Stats
Related papers
- Mutual-information Regularization In Markov Decision Processes And Actor-critic Learning (2019)0.00
- Entropic Regularization Of Markov Decision Processes (2019)6.77
- Twice Regularized Markov Decision Processes: The Equivalence Between Robustness And Regularization (2023)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- Comparison And Unification Of Three Regularization Methods In Batch Reinforcement Learning (2021)0.00
- Entropy Regularized Reinforcement Learning Using Large Deviation Theory (2021)6.34
- Regularization Matters In Policy Optimization (2019)2.68
- Regularization Guarantees Generalization In Bayesian Reinforcement Learning Through Algorithmic Stability (2021)0.00