Exploring Voice Conversion Based Data Augmentation In Text-dependent Speaker Verification
2020 Β· Xiaoyi Qin, Yaogen Yang, Lin Yang, et al.
Abstract
In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large scale text-dependent training data set which could be labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rare performance from 6.51% to 4.51% in the scenario of limited training data.
Authors
(none)
Tags
Stats
Related papers
- Data Augmentation Enhanced Speaker Enrollment For Text-dependent Speaker Verification (2020)0.00
- Unit Selection Synthesis Based Data Augmentation For Fixed Phrase Speaker Verification (2021)7.50
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Speaker Verification-derived Loss And Data Augmentation For Dnn-based Multispeaker Speech Synthesis (2021)3.58
- Relational Data Selection For Data Augmentation Of Speaker-dependent Multi-band Melgan Vocoder (2021)0.00
- Voice Conversion Can Improve ASR In Very Low-resource Settings (2021)7.50
- Voice Conversion Augmentation For Speaker Recognition On Defective Datasets (2024)2.26
- Cross-speaker Emotion Transfer For Low-resource Text-to-speech Using Non-parallel Voice Conversion With Pitch-shift Data Augmentation (2022)8.09