Discriminative Speaker Representation Via Contrastive Learning With Class-aware Attention In Angular Space
2022 Β· Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng
Abstract
The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation's discrimination ability. For the second challenge, we introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed gradient-based multi-objective optimization to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great speaker discrimination across languages.
Authors
(none)
Tags
Stats
Related papers
- Speaker Representation Learning Via Contrastive Loss With Maximal Speaker Separability (2022)10.68
- A Study On Angular Based Embedding Learning For Text-independent Speaker Verification (2019)2.26
- Experimenting With Additive Margins For Contrastive Self-supervised Speaker Verification (2023)4.52
- Additive Margin In Contrastive Self-supervised Frameworks To Learn Discriminative Speaker Representations (2024)2.26
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Angular Softmax Loss For End-to-end Speaker Verification (2018)11.19
- Self-supervised Learning Of Audio Representations Using Angular Contrastive Loss (2022)5.84
- Self-supervised Text-independent Speaker Verification Using Prototypical Momentum Contrastive Learning (2020)12.93