Abstract

We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the \(i\)-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains \(i\)-vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the \(i\)-vector representation), our proposed model outperforms the baseline by 23%. When reducing the training data to only using the tra

Authors

(none)

Tags

  • Speaker Analysis

Stats

  • citations4
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score5.24
  • arxiv keycheuk2019latent

Related papers