Decomposed Temporal Dynamic CNN: Efficient Time-adaptive Network For Text-independent Speaker Verification Explained With Speaker Activation Map
2022 Β· Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park
Abstract
To extract accurate speaker information for text-independent speaker verification, temporal dynamic CNNs (TDY-CNNs) adapting kernels to each time bin was proposed. However, model size of TDY-CNN is too large and the adaptive kernel's degree of freedom is limited. To address these limitations, we propose decomposed temporal dynamic CNNs (DTDY-CNNs) which forms time-adaptive kernel by combining static kernel with dynamic residual based on matrix decomposition. Proposed DTDY-ResNet-34(x0.50) using attentive statistical pooling without data augmentation shows EER of 0.96%, which is better than other state-of-the-art methods. DTDY-CNNs are successful upgrade of TDY-CNNs, reducing the model size by 64% and enhancing the performance. We showed that DTDY-CNNs extract more accurate frame-level speaker embeddings as well compared to TDY-CNNs. Detailed behaviors of DTDY-ResNet-34(x0.50) on extraction of speaker information were analyzed using speaker activation map (SAM) produced by modified grad
Authors
(none)
Tags
Stats
Related papers
- Temporal Dynamic Convolutional Neural Network For Text-independent Speaker Verification And Phonemetic Analysis (2021)11.19
- ECAPA-TDNN: Emphasized Channel Attention, Propagation And Aggregation In TDNN Based Speaker Verification (2020)23.07
- Dynamic Kernels And Channel Attention For Low Resource Speaker Verification (2022)0.00
- Next-tdnn: Modernizing Multi-scale Temporal Convolution Backbone For Speaker Verification (2023)10.07
- CAM++: A Fast And Efficient Network For Speaker Verification Using Context-aware Masking (2023)15.57
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Deep Speaker Feature Learning For Text-independent Speaker Verification (2017)12.54
- DS-TDNN: Dual-stream Time-delay Neural Network With Global-aware Filter For Speaker Verification (2023)8.60