Sharp Variance-dependent Bounds In Reinforcement Learning: Best Of Both Worlds In Stochastic And Deterministic Environments
2023 Β· Runlong Zhou, Zihan Zhang, Simon S. Du
Abstract
We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either variance-independent or suboptimal. We first propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm (Zhang et al., 2021a). We apply new analysis techniques to demonstrate that this algorithm enjoys variance-dependent bounds with respect to the norms we propose. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on model-free algorithms with variance-dependent regret bounds by designing a reference-function-based algorithm with a novel capped-doubling reference update schedule. Lastly
Authors
(none)
Tags
Stats
Related papers
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Information-theoretic Minimax Regret Bounds For Reinforcement Learning Based On Duality (2024)3.58
- Dynamic Regret Of Online Markov Decision Processes (2022)0.00
- Beyond Value-function Gaps: Improved Instance-dependent Regret Bounds For Episodic Reinforcement Learning (2021)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Regret Analysis In Deterministic Reinforcement Learning (2021)0.00
- Near-optimal Optimistic Reinforcement Learning Using Empirical Bernstein Inequalities (2019)0.00
- Logarithmic Regret Of Exploration In Average Reward Markov Decision Processes (2025)0.00