Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes
2023 Β· Zihan Zhang, Qiaomin Xie
Abstract
We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves \(\widetilde\{O\}(S^5A^2\mathrm\{sp\}(h^*)\sqrt\{T\})\) regret after \(T\) steps, where \(S\times A\) is the size of state-action space, and \(\mathrm\{sp\}(h^*)\) the span of the optimal bias function. Our results are the first to achieve optimal dependence in \(T\) for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an \(\epsilon\)-optimal policy using \(\widetilde\{O\} \left(\frac\{SA\mathrm\{sp\}^2(h^*)\}\{\epsilon^2\}+\frac\{S^2A\mathrm\{sp\}(h^*)\}\{\epsilon\} \right)\) samples, whereas the minimax lower bound is \(Ξ©\left(\frac\{SA\mathrm\{sp\}(h^*)\}\{\epsilon^2\}\right
Authors
(none)
Tags
Stats
Related papers
- Model-free Reinforcement Learning In Infinite-horizon Average-reward Markov Decision Processes (2019)0.00
- A Model-free Learning Algorithm For Infinite-horizon Average-reward Mdps With Near-optimal Regret (2020)0.00
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Reinforcement Learning For Infinite-horizon Average-reward Linear Mdps Via Approximation By Discounted-reward Mdps (2024)0.00
- Regret-optimal Model-free Reinforcement Learning For Discounted Mdps With Short Burn-in Time (2023)0.00
- Learning And Planning In Average-reward Markov Decision Processes (2020)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Near Sample-optimal Reduction-based Policy Learning For Average Reward MDP (2022)0.00