Deep Speaker Feature Learning For Text-independent Speaker Verification
2017 Β· Lantian Li, Yixiang Chen, Ying Shi, et al.
Abstract
Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.
Authors
(none)
Tags
Stats
Related papers
- Full-info Training For Deep Speaker Feature Learning (2017)7.16
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- DNN Based Speaker Recognition On Short Utterances (2016)0.00
- FDN: Finite Difference Network With Hierarchical Convolutional Features For Text-independent Speaker Verification (2021)0.00
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Deep CNN Based Feature Extractor For Text-prompted Speaker Recognition (2018)7.81
- Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification (2020)0.00