Augmentation Adversarial Training For Self-supervised Speaker Recognition
2020 Β· Jaesung Huh, Hee Soo Heo, Jingu Kang, et al.
Abstract
The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and across-utterance embeddings to be dissimilar. However, since the within-utterance segments share the same acoustic characteristics, it is difficult to separate the speaker information from the channel information. To this end, we propose augmentation adversarial training strategy that trains the network to be discriminative for the speaker information, while invariant to the augmentation applied. Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general. Extensive experiments on the VoxCeleb and VOiCES datasets show significant improvements over previous works using self-supervision, and the performance
Authors
(none)
Tags
Stats
Related papers
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Self-supervised Speaker Verification With Simple Siamese Network And Self-supervised Regularization (2021)10.85
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Curriculum Learning For Self-supervised Speaker Verification (2022)8.09
- Self-distillation Prototypes Network: Learning Robust Speaker Representations Without Supervision (2023)4.52
- Self-supervised Learning Based Domain Adaptation For Robust Speaker Verification (2021)11.49
- Robust Speaker Recognition Using Unsupervised Adversarial Invariance (2019)9.76
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23