Instance-dependent Confidence And Early Stopping For Reinforcement Learning
2022 Β· Koulik Khamaru, Eric Xia, Martin J. Wainwright, et al.
Abstract
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit\{ex post\} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-
Authors
(none)
Tags
Stats
Related papers
- Instance-dependent Near-optimal Policy Identification In Linear Mdps Via Online Experiment Design (2022)0.00
- Never Worse, Mostly Better: Stable Policy Improvement In Deep Reinforcement Learning (2019)0.00
- Instance-optimality In Interactive Decision Making: Toward A Non-asymptotic Theory (2023)0.00
- Regularization Guarantees Generalization In Bayesian Reinforcement Learning Through Algorithmic Stability (2021)0.00
- Taming "data-hungry" Reinforcement Learning? Stability In Continuous State-action Spaces (2024)2.26
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Model-agnostic Solutions For Deep Reinforcement Learning In Non-ergodic Contexts (2026)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00