Relational Data Selection For Data Augmentation Of Speaker-dependent Multi-band Melgan Vocoder
2021 Β· Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, et al.
Abstract
Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most real-world applications. To tackle the problem of limited target data, a data augmentation method based on speaker representation and similarity measurement of speaker verification is proposed in this paper. The proposed method selects utterances that have similar speaker identity to the target speaker from an external corpus, and then combines the selected utterances with the limited target data for SD vocoder adaptation. The evaluation results show that, compared with the vocoder adapted using only limited target data, the vocoder adapted using augmented data improves both the quality and similarity of synthesized speech.
Authors
(none)
Tags
Stats
Related papers
- Speaker Verification-derived Loss And Data Augmentation For Dnn-based Multispeaker Speech Synthesis (2021)3.58
- Training Generative Adversarial Network-based Vocoder With Limited Data Using Augmentation-conditional Discriminator (2024)2.26
- Exploring Voice Conversion Based Data Augmentation In Text-dependent Speaker Verification (2020)0.00
- Universal Melgan: A Robust Neural Vocoder For High-fidelity Waveform Generation In Multiple Domains (2020)0.00
- Data Augmentation Enhanced Speaker Enrollment For Text-dependent Speaker Verification (2020)0.00
- Unit Selection Synthesis Based Data Augmentation For Fixed Phrase Speaker Verification (2021)7.50
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Adaptive Data Augmentation With Naturalspeech3 For Far-field Speaker Verification (2025)0.00