Provably Efficient Reinforcement Learning For Discounted Mdps With Feature Mapping
2020 Β· Dongruo Zhou, Jiafan He, Quanquan Gu
Abstract
Modern tasks in reinforcement learning have large state and action spaces. To deal with them efficiently, one often uses predefined feature mapping to represent states and actions in a low-dimensional space. In this paper, we study reinforcement learning for discounted Markov Decision Processes (MDPs), where the transition kernel can be parameterized as a linear function of certain feature mapping. We propose a novel algorithm that makes use of the feature mapping and obtains a \(\tilde O(d\sqrt\{T\}/(1-\gamma)^2)\) regret, where \(d\) is the dimension of the feature space, \(T\) is the time horizon and \(\gamma\) is the discount factor of the MDP. To the best of our knowledge, this is the first polynomial regret bound without accessing the generative model or making strong assumptions such as ergodicity of the MDP. By constructing a special class of MDPs, we also show that for any algorithms, the regret is lower bounded by \(Ξ©(d\sqrt\{T\}/(1-\gamma)^\{1.5\})\). Our upper and lower bou
Authors
(none)
Tags
Stats
Related papers
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Reinforcement Learning For Infinite-horizon Average-reward Linear Mdps Via Approximation By Discounted-reward Mdps (2024)0.00
- Regret Bounds For Discounted Mdps (2020)0.00
- Regret-optimal Model-free Reinforcement Learning For Discounted Mdps With Short Burn-in Time (2023)0.00
- Refined Regret For Adversarial Mdps With Linear Function Approximation (2023)0.00
- Model-free Reinforcement Learning In Infinite-horizon Average-reward Markov Decision Processes (2019)0.00
- Near-optimal Policy Optimization Algorithms For Learning Adversarial Linear Mixture Mdps (2021)0.00