Exploring A Unified Attention-based Pooling Framework For Speaker Verification
2018 Β· Yi Liu, Liang He, Weiwei Liu, et al.
Abstract
The pooling layer is an essential component in the neural network based speaker verification. Most of the current networks in speaker verification use average pooling to derive the utterance-level speaker representations. Average pooling takes every frame as equally important, which is suboptimal since the speaker-discriminant power is different between speech segments. In this paper, we present a unified attention-based pooling framework and combine it with the multi-head attention. Experiments on the Fisher and NIST SRE 2010 dataset show that involving outputs from lower layers to compute the attention weights can outperform average pooling and achieve better results than vanilla attention method. The multi-head attention further improves the performance.
Authors
(none)
Tags
Stats
Related papers
- Double Multi-head Attention For Speaker Verification (2020)8.09
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Study On The Temporal Pooling Used In Deep Neural Networks For Speaker Verification (2021)5.84
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- End-to-end Attention Based Text-dependent Speaker Verification (2017)14.87
- Frequency And Multi-scale Selective Kernel Attention For Speaker Verification (2022)10.07