Abstract

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto \(B(\theta_0, \omega)\), a ball of fixed radius \(\omega\) around the initial point \(\theta_0\). We show an approximation bound of \(O(\epsilon) + \tilde\{O\} (1/\sqrt\{m\})\) where \(\epsilon\) is the approximation quality of the best neural network in \(B(\theta_0, \omega)\) and \(m\) is the width of all hidden layers in the network.

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keytian2023on

Related papers