A Deep Neural Network For Short-segment Speaker Recognition
2019 Β· Amirhossein Hajavi, Ali Etemad
Abstract
Todays interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the VoxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Embeddings For Far-field Speaker Recognition On Short Utterances (2020)11.29
- DNN Based Speaker Recognition On Short Utterances (2016)0.00
- Utterance-level Aggregation For Speaker Recognition In The Wild (2019)0.00
- Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification (2020)0.00
- Length- And Noise-aware Training Techniques For Short-utterance Speaker Recognition (2020)0.00
- Deep Neural Network Based I-vector Mapping For Speaker Verification Using Short Utterances (2018)0.00
- Universal Speaker Recognition Encoders For Different Speech Segments Duration (2022)4.52
- Meta-learning For Short Utterance Speaker Recognition With Imbalance Length Pairs (2020)15.61