Abstract

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations11
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score8.09
  • arxiv keyzhang2019vae

Related papers