Episodic Reinforcement Learning In Finite Mdps: Minimax Lower Bounds Revisited
2020 · Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, et al.
Abstract
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a novel lower bound of \(Ω((H^3SA/\epsilon^2)log(1/\delta))\) on the sample complexity of an \((\epsilon,\delta)\)-PAC algorithm for best policy identification in a non-stationary MDP. This lower bound relies on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the \(Ω(\sqrt\{H^3SAT\})\) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
Authors
(none)
Tags
Stats
Related papers
- Square-root Regret Bounds For Continuous-time Episodic Markov Decision Processes (2022)2.26
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Posterior Sampling-based Online Learning For Episodic Pomdps (2023)0.00
- Efficient Learning In Non-stationary Linear Markov Decision Processes (2020)6.77
- Beyond No Regret: Instance-dependent PAC Reinforcement Learning (2021)0.00
- Nearly Horizon-free Offline Reinforcement Learning (2021)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Beyond Value-function Gaps: Improved Instance-dependent Regret Bounds For Episodic Reinforcement Learning (2021)0.00