Regularization Guarantees Generalization In Bayesian Reinforcement Learning Through Algorithmic Stability
2021 Β· Aviv Tamar, Daniel Soudry, Ev Zisselman
Abstract
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of \(N\) problem instances from the prior, with the hope that for large enough \(N\), good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss -- an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast
Authors
(none)
Tags
Stats
Related papers
- Regularization Matters In Policy Optimization (2019)2.68
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Pac-bayesian Reinforcement Learning Trains Generalizable Policies (2025)0.00
- Dynamics Generalization Via Information Bottleneck In Deep Reinforcement Learning (2020)0.00
- Instance-dependent Confidence And Early Stopping For Reinforcement Learning (2022)0.00
- Evolving Pareto-optimal Actor-critic Algorithms For Generalizability And Stability (2022)0.00
- Temporal Regularization In Markov Decision Process (2018)0.00