Regret-optimal Model-free Reinforcement Learning For Discounted Mdps With Short Burn-in Time
2023 Β· Xiang Ji, Gen Li
Abstract
A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.
Authors
(none)
Tags
Stats
Related papers
- Model-free Reinforcement Learning In Infinite-horizon Average-reward Markov Decision Processes (2019)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Reinforcement Learning For Infinite-horizon Average-reward Linear Mdps Via Approximation By Discounted-reward Mdps (2024)0.00
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Regret-optimal Q-learning With Low Cost For Single-agent And Federated Reinforcement Learning (2025)0.00