Length- And Noise-aware Training Techniques For Short-utterance Speaker Recognition
2020 Β· Wenda Chen, Jonathan Huang, Tobias Bocklet
Abstract
Speaker recognition performance has been greatly improved with the emergence of deep learning. Deep neural networks show the capacity to effectively deal with impacts of noise and reverberation, making them attractive to far-field speaker recognition systems. The x-vector framework is a popular choice for generating speaker embeddings in recent literature due to its robust training mechanism and excellent performance in various test sets. In this paper, we start with early work on including invariant representation learning (IRL) to the loss function and modify the approach with centroid alignment (CA) and length variability cost (LVC) techniques to further improve robustness in noisy, far-field applications. This work mainly focuses on improvements for short-duration test utterances (1-8s). We also present improved results on long-duration tasks. In addition, this work discusses a novel self-attention mechanism. On the VOiCES far-field corpus, the combination of the proposed technique
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Embeddings For Far-field Speaker Recognition On Short Utterances (2020)11.29
- STC Speaker Recognition Systems For The Voices From A Distance Challenge (2019)7.81
- Utterance-level Aggregation For Speaker Recognition In The Wild (2019)0.00
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- A Deep Neural Network For Short-segment Speaker Recognition (2019)12.74
- Deep Neural Network Based I-vector Mapping For Speaker Verification Using Short Utterances (2018)0.00
- Within-sample Variability-invariant Loss For Robust Speaker Recognition Under Noisy Environments (2020)11.85
- Obovox Far Field Speaker Recognition: A Novel Data Augmentation Approach With Pretrained Models (2024)0.00