Analysis Of Length Normalization In End-to-end Speaker Verification System
2018 Β· Weicheng Cai, Jinkun Chen, Ming Li
Abstract
The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterance-level representations in automatic speaker verification systems. Traditionally, once i-vectors or deep speaker embeddings are extracted, we rely on an extra length normalization step to normalize the representations into unit-length hyperspace before back-end modeling. In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner. To this end, we add a length normalization layer followed by a scale layer before the output layer of the common classification network. We conducted experiments on the verification task of the Voxceleb1 dataset. The results show that integrating this simple step in the end-to-end training pipeline significantly boosts the performance of speaker verification. In the testing stage of our L2-normalized end-to-end system, a simple inner-product can achieve the state-of-the-art.
Authors
(none)
Tags
Stats
Related papers
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Spatial Pyramid Encoding With Convex Length Normalization For Text-independent Speaker Verification (2019)8.82
- Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions (2024)0.00
- Exploring The Encoding Layer And Loss Function In End-to-end Speaker And Language Recognition System (2018)17.07
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- Length- And Noise-aware Training Techniques For Short-utterance Speaker Recognition (2020)0.00
- End-to-end DNN Based Speaker Recognition Inspired By I-vector And PLDA (2017)10.35