Voice Conversion Augmentation For Speaker Recognition On Defective Datasets
2024 Β· Ruijie Tao, Zhan Shi, Yidi Jiang, et al.
Abstract
Modern speaker recognition system relies on abundant and balanced datasets for classification training. However, diverse defective datasets, such as partially-labelled, small-scale, and imbalanced datasets, are common in real-world applications. Previous works usually studied specific solutions for each scenario from the algorithm perspective. However, the root cause of these problems lies in dataset imperfections. To address these challenges with a unified solution, we propose the Voice Conversion Augmentation (VCA) strategy to obtain pseudo speech from the training set. Furthermore, to guarantee generation quality, we designed the VCA-NN~(nearest neighbours) strategy to select source speech from utterances that are close to the target speech in the representation space. Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspe
Authors
(none)
Tags
Stats
Related papers
- Exploring Voice Conversion Based Data Augmentation In Text-dependent Speaker Verification (2020)0.00
- Voice Conversion Can Improve ASR In Very Low-resource Settings (2021)7.50
- Obovox Far Field Speaker Recognition: A Novel Data Augmentation Approach With Pretrained Models (2024)0.00
- Speaker Verification-derived Loss And Data Augmentation For Dnn-based Multispeaker Speech Synthesis (2021)3.58
- Augmentation Adversarial Training For Self-supervised Speaker Recognition (2020)0.00
- Relational Data Selection For Data Augmentation Of Speaker-dependent Multi-band Melgan Vocoder (2021)0.00
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23