Abstract

In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.

Authors

(none)

Tags

  • Speech Recognition

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keychen2020shanerun

Related papers