Multi-frequency Information Enhanced Channel Attention Module For Speaker Representation Learning
2022 Β· Mufan Sang, John H. L. Hansen
Abstract
Recently, attention mechanisms have been applied successfully in neural network-based speaker verification systems. Incorporating the Squeeze-and-Excitation block into convolutional neural networks has achieved remarkable performance. However, it uses global average pooling (GAP) to simply average the features along time and frequency dimensions, which is incapable of preserving sufficient speaker information in the feature maps. In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition. To strengthen the speaker information extraction ability, we propose to utilize multi-frequency information and design two novel and effective attention modules, called Single-Frequency Single-Channel (SFSC) attention module and Multi-Frequency Single-Channel (MFSC) attention module. The proposed attention modules can effectively capture more speaker information from
Authors
(none)
Tags
Stats
Related papers
- Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning (2021)5.24
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Frequency And Multi-scale Selective Kernel Attention For Speaker Verification (2022)10.07
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Attention And DCT Based Global Context Modeling For Text-independent Speaker Recognition (2022)7.50
- Speaker Representation Learning Using Global Context Guided Channel And Time-frequency Transformations (2020)6.34
- Self Multi-head Attention For Speaker Recognition (2019)13.84