Regret Minimization For Reinforcement Learning By Evaluating The Optimal Bias Function
2019 Β· Zihan Zhang, Xiangyang Ji
Abstract
We present an algorithm based on the *Optimism in the Face of Uncertainty* (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function \(h^\{*\}\), the proposed algorithm achieves a regret bound of \(\tilde\{O\}(\sqrt\{SAHT\})\)\footnote\{The symbol \(\tilde\{O\}\) means \(O\) with log factors ignored. \} for MDP with \(S\) states and \(A\) actions, in the case that an upper bound \(H\) on the span of \(h^\{*\}\), i.e., \(sp(h^\{*\})\) is known. This result outperforms the best previous regret bounds \(\tilde\{O\}(S\sqrt\{AHT\}) \)\citep\{fruit2019improved\} by a factor of \(\sqrt\{S\}\). Furthermore, this regret bound matches the lower bound of \(Ξ©(\sqrt\{SAHT\}) \)\citep\{jaksch2010near\} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of \(\tilde\{O\}(\sqrt\{SADT\})\) for MDPs
Authors
(none)
Tags
Stats
Related papers
- Minimax Regret Bounds For Reinforcement Learning (2017)0.00
- Variance-aware Regret Bounds For Undiscounted Reinforcement Learning In Mdps (2018)0.00
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Refined Regret For Adversarial Mdps With Linear Function Approximation (2023)0.00
- First-order Regret In Reinforcement Learning With Linear Function Approximation: A Robust Estimation Approach (2021)0.00
- Near-optimal Optimistic Reinforcement Learning Using Empirical Bernstein Inequalities (2019)0.00
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00