Multi-stream Convolutional Neural Network With Frequency Selection For Robust Speaker Verification
2020 Β· Wei Yao, Shen Chen, Jiamin Cui, et al.
Abstract
Speaker verification aims to verify whether an input speech corresponds to the claimed speaker, and conventionally, this kind of system is deployed based on single-stream scenario, wherein the feature extractor operates in full frequency range. In this paper, we hypothesize that machine can learn enough knowledge to do classification task when listening to partial frequency range instead of full frequency range, which is so called frequency selection technique, and further propose a novel framework of multi-stream Convolutional Neural Network (CNN) with this technique for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. For the diversity of temporal embeddings, we consider feature augmentation with frequency selection, which is to manually segment the full-band of frequency into several sub-bands, and the feature extractor of each stream can select which sub-bands
Authors
(none)
Tags
Stats
Related papers
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Frequency And Multi-scale Selective Kernel Attention For Speaker Verification (2022)10.07
- Speaker Verification In Multi-speaker Environments Using Temporal Feature Fusion (2022)0.00
- Speaker Verification Using Convolutional Neural Networks (2018)0.00
- Rsknet-mtsp: Effective And Portable Deep Architecture For Speaker Verification (2021)9.03
- Multi-channel Speaker Verification For Single And Multi-talker Speech (2020)0.00
- An Improved Deep Neural Network For Modeling Speaker Characteristics At Different Temporal Scales (2020)6.34
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50