Speakernet: 1D Depth-wise Separable Convolutional Network For Text-independent Speaker Recognition And Verification
2020 Β· Nithin Rao Koluguri, Jason Li, Vitaly Lavrukhin, et al.
Abstract
We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture uses x-vector based statistics pooling layer to map variable-length utterances to a fixed-length embedding (q-vector). SpeakerNet-M is a simple lightweight model with just 5M parameters. It doesn't use voice activity detection (VAD) and achieves close to state-of-the-art performance scoring an Equal Error Rate (EER) of 2.10% on the VoxCeleb1 cleaned and 2.29% on the VoxCeleb1 trial files.
Authors
(none)
Tags
Stats
Related papers
- Titanet: Neural Model For Speaker Representation With 1D Depth-wise Separable Convolutions And Global Context (2021)14.90
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Rsknet-mtsp: Effective And Portable Deep Architecture For Speaker Verification (2021)9.03
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- A Deep Neural Network For Short-segment Speaker Recognition (2019)12.74
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54