Unifying PAC And Regret: Uniform PAC Bounds For Episodic Reinforcement Learning
2017 Β· Christoph Dann, Tor Lattimore, Emma Brunskill
Abstract
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.
Authors
(none)
Tags
Stats
Related papers
- Uniform-pac Bounds For Reinforcement Learning With Linear Function Approximation (2021)0.00
- Beyond No Regret: Instance-dependent PAC Reinforcement Learning (2021)0.00
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Beyond Value-function Gaps: Improved Instance-dependent Regret Bounds For Episodic Reinforcement Learning (2021)0.00
- Statistical Guarantees For Lifelong Reinforcement Learning Using Pac-bayes Theory (2024)0.00
- Episodic Reinforcement Learning In Finite Mdps: Minimax Lower Bounds Revisited (2020)0.00
- Square-root Regret Bounds For Continuous-time Episodic Markov Decision Processes (2022)2.26