Deep Representation Decomposition For Rate-invariant Speaker Verification
2022 Β· Fuchuan Tong, Siqi Zheng, Haodong Zhou, et al.
Abstract
While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation decomposition approach with adversarial learning to learn speaking rate-invariant speaker embeddings. Specifically, adopting an attention block, we decompose the original embedding into an identity-related component and a rate-related component through multi-task training. Additionally, to reduce the latent relationship between the two decomposed components, we further propose a cosine mapping block to train the parameters adversarially to minimize the cosine similarity between the two decomposed components. As a result, identity-related features become robust to speaking rate and then are used fo
Authors
(none)
Tags
Stats
Related papers
- Intra-class Variation Reduction Of Speaker Representation In Disentanglement Framework (2020)8.35
- Deep Segment Attentive Embedding For Duration Robust Speaker Verification (2018)2.26
- DEAAN: Disentangled Embedding And Adversarial Adaptation Network For Robust Speaker Representation Learning (2020)9.59
- Variable Frame Rate-based Data Augmentation To Handle Speaking-style Variability For Automatic Speaker Verification (2020)3.58
- Attention-based Conditioning Methods Using Variable Frame Rate For Style-robust Speaker Verification (2022)2.26
- Disentangled Speaker And Nuisance Attribute Embedding For Robust Speaker Verification (2020)8.60
- Robust Speaker Recognition Using Unsupervised Adversarial Invariance (2019)9.76
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59