Abstract

It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an \(\epsilon\)-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an \(\epsilon\)-optimal policy in a \(\gamma\)-discounted MDP is \(O(K/(1-\gamma)^3\epsilon^2)\), which only depends on the dimensionality \(K\) of the feature space and has no dependence on the state or action space. We further extend our results to

Authors

(none)

Tags

  • Model-Based RL
  • Value-Based

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keycui2020is

Related papers