CA-MHFA: A Context-aware Multi-head Factorized Attentive Pooling For Ssl-based Speaker Verification
2024 · Junyi Peng, Ladislav Mošner, Lin Zhang, et al.
Abstract
Self-supervised learning (SSL) models for speaker verification (SV) have gained significant attention in recent years. However, existing SSL-based SV systems often struggle to capture local temporal dependencies and generalize across different tasks. In this paper, we propose context-aware multi-head factorized attentive pooling (CA-MHFA), a lightweight framework that incorporates contextual information from surrounding frames. CA-MHFA leverages grouped, learnable queries to effectively model contextual dependencies while maintaining efficiency by sharing keys and values across groups. Experimental results on the VoxCeleb dataset show that CA-MHFA achieves EERs of 0.42%, 0.48%, and 0.96% on Vox1-O, Vox1-E, and Vox1-H, respectively, outperforming complex models like WavLM-TDNN with fewer parameters and faster convergence. Additionally, CA-MHFA demonstrates strong generalization across multiple SSL models and tasks, including emotion recognition and anti-spoofing, highlighting its robust
Authors
(none)
Tags
Stats
Related papers
- Aca-net: Towards Lightweight Speaker Verification Using Asymmetric Cross Attention (2023)0.00
- Double Multi-head Attention For Speaker Verification (2020)8.09
- CA-SSLR: Condition-aware Self-supervised Learning Representation For Generalized Speech Processing (2024)3.58
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- Additive Margin In Contrastive Self-supervised Frameworks To Learn Discriminative Speaker Representations (2024)2.26
- Exploring A Unified Attention-based Pooling Framework For Speaker Verification (2018)6.77
- One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification (2023)7.81
- BSS-CFFMA: Cross-domain Feature Fusion And Multi-attention Speech Enhancement Network Based On Self-supervised Embedding (2024)4.52