Speaker Recognition From Raw Waveform With Sincnet
2018 Β· Mirco Ravanelli, Yoshua Bengio
Abstract
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants. Proper design of the neural network is crucial to achieve this goal. This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. In contrast to standard CNNs, that learn all elements of each filter, only low and high cutoff frequencies are directly learned from data with the proposed method. This offers a very compact and efficient way to
Authors
(none)
Tags
Stats
Related papers
- Speech And Speaker Recognition From Raw Waveform With Sincnet (2018)0.00
- Pf-net: Personalized Filter For Speaker Recognition From Raw Waveform (2021)3.58
- Curricular Sincnet: Towards Robust Deep Speaker Recognition By Emphasizing Hard Samples In Latent Space (2021)4.52
- Additive Margin Sincnet For Speaker Recognition (2019)7.16
- What Do Neural Networks Listen To? Exploring The Crucial Bands In Speech Enhancement Using Sinc-convolution (2024)2.26
- Improved Rawnet With Feature Map Scaling For Text-independent Speaker Verification Using Raw Waveforms (2020)14.15
- Rawnet: Advanced End-to-end Deep Neural Network Using Raw Waveforms For Text-independent Speaker Verification (2019)15.34
- Learning Multiscale Features Directly From Waveforms (2016)0.00