Abstract

This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make use of still images and sequences from videos for training the networks and compare the performances implementing the above architectures. The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos. The contribution of this paper is two-fold: to begin, the experiments have established 3-D Convolutional networks and 2-D LSTMs with the contrastive loss on image sequences do not outperform Google/Inception architecture with contrastive loss in top \(n\) rank face retrievals with still images. However, the 3-D Convolution networks and 2-D LS

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations7
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score6.77
  • arxiv keyhuo2020unique

Related papers