On The Performance Of Temporal Difference Learning With Neural Networks

Abstract

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto \(B(\theta_0, \omega)\), a ball of fixed radius \(\omega\) around the initial point \(\theta_0\). We show an approximation bound of \(O(\epsilon) + \tilde\{O\} (1/\sqrt\{m\})\) where \(\epsilon\) is the approximation quality of the best neural network in \(B(\theta_0, \omega)\) and \(m\) is the width of all hidden layers in the network.

On The Performance Of Temporal Difference Learning With Neural Networks

Abstract

Authors

Tags

Stats

Related papers