Speechnas: Towards Better Trade-off Between Latency And Accuracy For Large-scale Speaker Verification
2021 Β· Wentao Zhu, Tianlong Kong, Shun Lu, et al.
Abstract
Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-s
Authors
(none)
Tags
Stats
Related papers
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Bayesian X-vector: Bayesian Neural Network Based X-vector System For Speaker Verification (2020)6.77
- Evolutionary Algorithm Enhanced Neural Architecture Search For Text-independent Speaker Verification (2020)8.09
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- P-vectors: A Parallel-coupled Tdnn/transformer Network For Speaker Verification (2023)5.84
- Efficienttdnn: Efficient Architecture Search For Speaker Recognition (2021)10.07
- An Improved Deep Neural Network For Modeling Speaker Characteristics At Different Temporal Scales (2020)6.34
- Rsknet-mtsp: Effective And Portable Deep Architecture For Speaker Verification (2021)9.03