MFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short Utterances
2022 Β· Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, et al.
Abstract
The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification. However, they require a large number of filters to capture the speaker characteristics at any local frequency region. In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale frequency-channel attention (MFA), where we characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN. We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and computation complexity. Further, the MFA mechanism is found to be effective for speaker verification with short test utterances.
Authors
(none)
Tags
Stats
Related papers
- MGFF-TDNN: A Multi-granularity Feature Fusion TDNN Model With Depth-wise Separable Module For Speaker Verification (2025)0.00
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- DS-TDNN: Dual-stream Time-delay Neural Network With Global-aware Filter For Speaker Verification (2023)8.60
- Mfa-conformer: Multi-scale Feature Aggregation Conformer For Automatic Speaker Verification (2022)15.46
- CAM++: A Fast And Efficient Network For Speaker Verification Using Context-aware Masking (2023)15.57
- P-vectors: A Parallel-coupled Tdnn/transformer Network For Speaker Verification (2023)5.84
- MACCIF-TDNN: Multi Aspect Aggregation Of Channel And Context Interdependence Features In Tdnn-based Speaker Verification (2021)6.77
- FDN: Finite Difference Network With Hierarchical Convolutional Features For Text-independent Speaker Verification (2021)0.00