Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning
2021 Β· Li Zhang, Qing Wang, Lei Xie
Abstract
The use of channel-wise attention in CNN based speaker representation networks has achieved remarkable performance in speaker verification (SV). But these approaches do simple averaging on time and frequency feature maps before channel-wise attention learning and ignore the essential mutual interaction among temporal, channel as well as frequency scales. To address this problem, we propose the Duality Temporal-Channel-Frequency (DTCF) attention to re-calibrate the channel-wise features with aggregation of global context on temporal and frequency dimensions. Specifically, the duality attention - time-channel (T-C) attention as well as frequency-channel (F-C) attention - aims to focus on salient regions along the T-C and F-C feature maps that may have more considerable impact on the global context, leading to more discriminative speaker representations. We evaluate the effectiveness of the proposed DTCF attention on the CN-Celeb and VoxCeleb datasets. On the CN-Celeb evaluation set, the
Authors
(none)
Tags
Stats
Related papers
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Multi-frequency Information Enhanced Channel Attention Module For Speaker Representation Learning (2022)0.00
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Speaker Representation Learning Using Global Context Guided Channel And Time-frequency Transformations (2020)6.34
- Attention And DCT Based Global Context Modeling For Text-independent Speaker Recognition (2022)7.50
- Efficient Encoder-decoder And Dual-path Conformer For Comprehensive Feature Learning In Speech Enhancement (2023)7.16
- MFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short Utterances (2022)13.79