Beyond No Regret: Instance-dependent PAC Reinforcement Learning
2021 Β· Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
Abstract
The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying \(\epsilon\)-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an \(\epsilon\)-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an \(\epsilon\)-optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity -- yielding a complexity which scales with the suboptimality gaps and the "reachab
Authors
(none)
Tags
Stats
Related papers
- Unifying PAC And Regret: Uniform PAC Bounds For Episodic Reinforcement Learning (2017)0.00
- Instance-dependent Near-optimal Policy Identification In Linear Mdps Via Online Experiment Design (2022)0.00
- Episodic Reinforcement Learning In Finite Mdps: Minimax Lower Bounds Revisited (2020)0.00
- Statistical Guarantees For Lifelong Reinforcement Learning Using Pac-bayes Theory (2024)0.00
- Efficient PAC Reinforcement Learning In Regular Decision Processes (2021)2.26
- Instance-dependent Confidence And Early Stopping For Reinforcement Learning (2022)0.00
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24
- Pac-bayesian Reinforcement Learning Trains Generalizable Policies (2025)0.00