Near-optimal Optimistic Reinforcement Learning Using Empirical Bernstein Inequalities
2019 Β· Aristide Tossou, Debabrota Basu, Christos Dimitrakakis
Abstract
We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret \(\tilde\{\mathcal\{O\}\}(\sqrt\{DSAT\})\) up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.
Authors
(none)
Tags
Stats
Related papers
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Minimax Regret Bounds For Reinforcement Learning (2017)0.00
- Tightening Exploration In Upper Confidence Reinforcement Learning (2020)0.00
- Fundamental Limits Of Reinforcement Learning In Environment With Endogeneous And Exogeneous Uncertainty (2021)0.00
- Regret Minimization For Reinforcement Learning By Evaluating The Optimal Bias Function (2019)0.00
- Efficient Exploration In Average-reward Constrained Reinforcement Learning: Achieving Near-optimal Regret With Posterior Sampling (2024)0.00
- Posterior Sampling For Reinforcement Learning: Worst-case Regret Bounds (2017)0.00
- Bridging Distributional And Risk-sensitive Reinforcement Learning With Provable Regret Bounds (2022)0.00