Double Multi-head Attention For Speaker Verification
2020 Β· Miquel India, Pooyan Safari, Javier Hernando
Abstract
Most state-of-the-art Deep Learning systems for speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention pooling, which extends our previous approach based on Self Multi-Head Attention. An additional self attention layer is added to the pooling layer that summarizes the context vectors produced by Multi-Head Attention into a unique speaker representation. This method enhances the pooling mechanism by giving weights to the information captured for each head and it results in creating more discriminative speaker embeddings. We have evaluated our approach with the VoxCeleb2 dataset. Our results show 6.09% and 5.23% relative improvement in terms of EER compared to Self Attention pooling and Self Multi-Head Attention, respectively. According to the obtained
Authors
(none)
Tags
Stats
Related papers
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- Exploring A Unified Attention-based Pooling Framework For Speaker Verification (2018)6.77
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Attention Back-end For Automatic Speaker Verification With Multiple Enrollment Utterances (2021)10.21
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- CA-MHFA: A Context-aware Multi-head Factorized Attentive Pooling For Ssl-based Speaker Verification (2024)6.34
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50