Abstract

End-to-end speaker verification systems have received increasing interests. The traditional i-vector approach trains a generative model (basically a factor-analysis model) to extract i-vectors as speaker embeddings. In contrast, the end-to-end approach directly trains a discriminative model (often a neural network) to learn discriminative speaker embeddings; a crucial component is the training criterion. In this paper, we use angular softmax (A-softmax), which is originally proposed for face verification, as the loss function for feature learning in end-to-end speaker verification. By introducing margins between classes into softmax loss, A-softmax can learn more discriminative features than softmax loss and triplet loss, and at the same time, is easy and stable for usage. We make two contributions in this work. 1) We introduce A-softmax loss into end-to-end speaker verification and achieve significant EER reductions. 2) We find that the combination of using A-softmax in training the f

Authors

(none)

Tags

  • Speaker Analysis

Stats

  • citations30
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score11.19
  • arxiv keyli2018angular

Related papers