Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps

Abstract

We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a large number of optimization runs in each time step, especially when the state and action spaces are large. To address this issue, we propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE), which is a classic model-based exploration principle in the adaptive control literature for resolving the well-known closed-loop identification problem of Maximum Likelihood Estimation. We formally show that (i) VBMLE enjoys \(\widetilde\{O\}(d\sqrt\{T\})\) regret, where \(T\) is the time horizon and \(d\) is the dimension of the model parameter, and (ii) VBMLE i

Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps

Abstract

Authors

Tags

Stats

Related papers