Deep CNN Based Feature Extractor For Text-prompted Speaker Recognition
2018 Β· Sergey Novoselov, Oleg Kudashev, Vadim Schemelinin, et al.
Abstract
Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word states - i.e. digits -to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring. The key feature of our network is the Max-Feature-Map activation function, which acts as an embedded feature selector. By using multitask learning scheme to train the high-level feature extractor we were able to surpass the classic baseline systems in terms of quality and achieved impressive results for such a novice approach, getting 2.85% EER on the RSR2015 evaluation set. Fusion of the proposed and the baseline systems improves this result.
Authors
(none)
Tags
Stats
Related papers
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54
- On Residual CNN In Text-dependent Speaker Verification Task (2017)7.16
- A Comparative Re-assessment Of Feature Extractors For Deep Speaker Embeddings (2020)8.09
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- On Deep Speaker Embeddings For Text-independent Speaker Recognition (2018)11.93
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification (2020)0.00
- End-to-end Attention Based Text-dependent Speaker Verification (2017)14.87