Pac-bayesian Reinforcement Learning Trains Generalizable Policies
2025 Β· Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, et al.
Abstract
We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining generalization guarantees for reinforcement learning, where the sequential nature of data breaks the independence assumptions underlying classical bounds. The new bound provides non-vacuous certificates for modern off-policy algorithms such as Soft Actor-Critic. We demonstrate the practical utility of the bound through PB-SAC, a novel algorithm that optimizes the bound during training to guide exploration. Experiments across several continuous control tasks show that the proposed approach provides meaningful confidence certificates while maintaining competitive performance.
Authors
(none)
Tags
Stats
Related papers
- Deep Exploration With Pac-bayes (2024)0.00
- Statistical Guarantees For Lifelong Reinforcement Learning Using Pac-bayes Theory (2024)0.00
- Regularization Guarantees Generalization In Bayesian Reinforcement Learning Through Algorithmic Stability (2021)0.00
- Unifying PAC And Regret: Uniform PAC Bounds For Episodic Reinforcement Learning (2017)0.00
- PAC Guarantees For Cooperative Multi-agent Reinforcement Learning With Restricted Communication (2019)0.00
- Neuro-algorithmic Policies Enable Fast Combinatorial Generalization (2021)0.00
- Entropy Augmented Reinforcement Learning (2022)0.00
- Beyond No Regret: Instance-dependent PAC Reinforcement Learning (2021)0.00