Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes

Abstract

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret \(\tilde O(d\sqrt\{H^3K\})\), where \(d\) is the dimension of the feature mapping, \(H\) is the planning horizon, and \(K\) is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer

Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes

Abstract

Authors

Tags

Stats

Related papers