Abstract

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general \(L\)-layer neural network. New proof techniques are developed and an improved new \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1\})\) sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1\})\) complexity under the Markovian sampling, as opposed to the best known \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) complexity in the existing literature.

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyke2024an

Related papers