Attention Mechanism In Speaker Recognition: What Does It Learn In Deep Speaker Embedding?
2018 Β· Qiongqiong Wang, Koji Okabe, Kong Aik Lee, et al.
Abstract
This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterancelevel representation is produced at the pooling layer in a speaker embedding network. In general, an attention model is trained together with the speaker embedding network on a single objective function, and thus those two components are tightly bound to one another. In this paper, we consider the possibility that the attention model might be decoupled from its parent network and assist other speaker embedding networks and even conventional i-vector extractors. This possibility is demonstrated through a series of experiments on a NIST Speaker Recognition Evaluation (SRE) task, with 9.0% EER reduction and 3.8% min_Cprimary reduction w
Authors
(none)
Tags
Stats
Related papers
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model (2019)0.00
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- End-to-end Attention Based Text-dependent Speaker Verification (2017)14.87
- Phonetic-attention Scoring For Deep Speaker Features In Speaker Verification (2018)2.26
- Deep Speaker Embeddings For Far-field Speaker Recognition On Short Utterances (2020)11.29
- Multi-frequency Information Enhanced Channel Attention Module For Speaker Representation Learning (2022)0.00