Mfa-conformer: Multi-scale Feature Aggregation Conformer For Automatic Speaker Verification
2022 Β· Yang Zhang, Zhiqiang Lv, Haibin Wu, et al.
Abstract
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-implement, simple but effective backbone for automatic speaker verification based on the Convolution-augmented Transformer (Conformer). The architecture of the MFA-Conformer is inspired by recent stateof-the-art models in speech recognition and speaker verification. Firstly, we introduce a convolution subsampling layer to decrease the computational cost of the model. Secondly, we adopt Conformer blocks which combine Transformers and convolution neural networks (CNNs) to capture global and local features effectively. Finally, the output feature maps from all Conformer blocks are concatenated to aggregate multi-scale representations before final pooling. We evaluate the MFA-Conformer on the widely used benchmarks. The best system obtains 0.64%, 1.29% and 1.63% EER on VoxCeleb1-O, SITW.Dev, and SITW.Eval set, respectively. MFA-Conformer significantly outperforms the popular ECAPA-TDNN systems i
Authors
(none)
Tags
Stats
Related papers
- Improving Speaker Representations Using Contrastive Losses On Multi-scale Features (2024)0.00
- MFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short Utterances (2022)13.79
- Whisper-pmfa: Partial Multi-scale Feature Aggregation For Speaker Verification Using Whisper Models (2024)0.00
- Improving Multi-scale Aggregation Using Feature Pyramid Module For Robust Speaker Verification Of Variable-duration Utterances (2020)10.48
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47
- Leveraging ASR Pretrained Conformers For Speaker Verification Through Transfer Learning And Knowledge Distillation (2023)10.74
- Next-tdnn: Modernizing Multi-scale Temporal Convolution Backbone For Speaker Verification (2023)10.07
- Hm-conformer: A Conformer-based Audio Deepfake Detection System With Hierarchical Pooling And Multi-level Classification Token Aggregation Methods (2023)9.03