Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System
2020 Β· Soonshin Seo, Ji-Hwan Kim
Abstract
One of the most important parts of an end-to-end speaker verification system is the speaker embedding generation. In our previous paper, we reported that shortcut connections-based multi-layer aggregation improves the representational power of the speaker embedding. However, the number of model parameters is relatively large and the unspecified variations increase in the multi-layer aggregation. Therefore, we propose a self-attentive multi-layer aggregation with feature recalibration and normalization for end-to-end speaker verification system. To reduce the number of model parameters, the ResNet, which scaled channel width and layer depth, is used as a baseline. To control the variability in the training, a self-attention mechanism is applied to perform the multi-layer aggregation with dropout regularizations and batch normalizations. Then, a feature recalibration layer is applied to the aggregated feature using fully-connected layers and nonlinear activation functions. Deep length no
Authors
(none)
Tags
Stats
Related papers
- Analysis Of Length Normalization In End-to-end Speaker Verification System (2018)9.41
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances (2022)0.00
- Improving Multi-scale Aggregation Using Feature Pyramid Module For Robust Speaker Verification Of Variable-duration Utterances (2020)10.48
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Double Multi-head Attention For Speaker Verification (2020)8.09
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- Graph Attentive Feature Aggregation For Text-independent Speaker Verification (2021)6.34