Dr-vectors: Decision Residual Networks And An Improved Loss For Speaker Recognition
2021 Β· Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno
Abstract
Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle utterance-specific uncertainty. In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information. This is achieved by incorporating a 2nd-stage neural network (known as a decision network) as part of an end-to-end training regimen. In particular, we propose the concept of decision residual networks which involves the use of a compact decision network to leverage cosine scores and to model the residual signal that's needed. Additionally, we present a modification to the generalized end-to-end softmax loss function to target the separation of same/different speaker scores. We observed significant performance gains for the two techniques.
Authors
(none)
Tags
Stats
Related papers
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions (2024)0.00
- End-to-end Residual CNN With L-GM Loss Speaker Verification System (2018)2.26
- Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification (2020)0.00
- On Deep Speaker Embeddings For Text-independent Speaker Recognition (2018)11.93
- Length- And Noise-aware Training Techniques For Short-utterance Speaker Recognition (2020)0.00
- Within-sample Variability-invariant Loss For Robust Speaker Recognition Under Noisy Environments (2020)11.85
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00