Comparison And Unification Of Three Regularization Methods In Batch Reinforcement Learning
2021 Β· Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez
Abstract
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.
Authors
(none)
Tags
Stats
Related papers
- Temporal Regularization In Markov Decision Process (2018)0.00
- Regularization Matters In Policy Optimization (2019)2.68
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- A Kl-regularization Framework For Learning To Plan With Adaptive Priors (2025)0.00
- The Unintended Consequences Of Discount Regularization: Improving Regularization In Certainty Equivalence Reinforcement Learning (2023)0.00
- Twice Regularized Mdps And The Equivalence Between Robustness And Regularization (2021)0.00
- Twice Regularized Markov Decision Processes: The Equivalence Between Robustness And Regularization (2023)0.00
- Regularize! Don't Mix: Multi-agent Reinforcement Learning Without Explicit Centralized Structures (2021)0.00