Abstract

This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. System-1 is an autoencoder based approach which tries to reconstruct similar i-vectors, whereas System-2 and -3 are Convolutional Neural Network (CNN) based siamese architectures. The siamese networks have two and three branches, respectively, where each branch is a CNN encoder. The double-branch siamese performs binary classification using cross entropy loss during training. Whereas, our triple-branch siamese is trained to learn speaker embeddings using triplet loss. We provide results of our systems on VoxCeleb-1 test, VoxSRC-20 validation and test sets.

Authors

(none)

Tags

  • Speech Recognition

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keykhan2020the

Related papers