Aca-net: Towards Lightweight Speaker Verification Using Asymmetric Cross Attention
2023 Β· Jia Qi Yip, Tuan Truong, Dianwen Ng, et al.
Abstract
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's non-stationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5% relative improvement in EER using only 1/5 of the parameters.
Authors
(none)
Tags
Stats
Related papers
- CA-MHFA: A Context-aware Multi-head Factorized Attentive Pooling For Ssl-based Speaker Verification (2024)6.34
- Speaker Verification Using Attentive Multi-scale Convolutional Recurrent Network (2023)0.00
- Multi-task Network For Noise-robust Keyword Spotting And Speaker Verification Using Ctc-based Soft VAD And Global Query Attention (2020)9.41
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00
- Frequency And Multi-scale Selective Kernel Attention For Speaker Verification (2022)10.07
- Attentive Statistics Pooling For Deep Speaker Embedding (2018)18.88
- CAM++: A Fast And Efficient Network For Speaker Verification Using Context-aware Masking (2023)15.57
- Rsknet-mtsp: Effective And Portable Deep Architecture For Speaker Verification (2021)9.03