Full-info Training For Deep Speaker Feature Learning
2017 Β· Lantian Li, Zhiyuan Tang, Dong Wang, et al.
Abstract
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential problem of the present model is that it involves a parametric classifier, i.e., the last affine layer, which may consume some discriminative knowledge, thus leading to `information leak' for the feature learning. This paper presents a full-info training approach that discards the parametric classifier and enforces all the discriminative knowledge learned by the feature net. Our experiments on the Fisher database demonstrate that this new training scheme can produce more coherent features, leading to consistent and notable performance improvement on the speaker verification task.
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- FDN: Finite Difference Network With Hierarchical Convolutional Features For Text-independent Speaker Verification (2021)0.00
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Deep Factorization For Speech Signal (2018)8.82
- Parameterized Channel Normalization For Far-field Deep Speaker Verification (2021)3.58
- DNN Based Speaker Recognition On Short Utterances (2016)0.00
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00