Speech And Speaker Recognition From Raw Waveform With Sincnet
2018 Β· Mirco Ravanelli, Yoshua Bengio
Abstract
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly lear
Authors
(none)
Tags
Stats
Related papers
- Speaker Recognition From Raw Waveform With Sincnet (2018)20.65
- Pf-net: Personalized Filter For Speaker Recognition From Raw Waveform (2021)3.58
- Curricular Sincnet: Towards Robust Deep Speaker Recognition By Emphasizing Hard Samples In Latent Space (2021)4.52
- Rawnet: Advanced End-to-end Deep Neural Network Using Raw Waveforms For Text-independent Speaker Verification (2019)15.34
- What Do Neural Networks Listen To? Exploring The Crucial Bands In Speech Enhancement Using Sinc-convolution (2024)2.26
- Additive Margin Sincnet For Speaker Recognition (2019)7.16
- Improved Rawnet With Feature Map Scaling For Text-independent Speaker Verification Using Raw Waveforms (2020)14.15
- Raw Waveform-based Speech Enhancement By Fully Convolutional Networks (2017)16.63