Data Efficient Voice Cloning From Noisy Samples With Domain Adversarial Training
2020 · Jian Cong, Shan Yang, Lei Xie, et al.
Abstract
Data efficient voice cloning aims at synthesizing target speaker's voice with only a few enrollment samples at hand. To this end, speaker adaptation and speaker encoding are two typical methods based on base model trained from multiple speakers. The former uses a small set of target speaker data to transfer the multi-speaker model to target speaker's voice through direct model update, while in the latter, only a few seconds of target speaker's audio directly goes through an extra speaker encoding model along with the multi-speaker model to synthesize target speaker's voice without model update. Nevertheless, the two methods need clean target speaker data. However, the samples provided by user may inevitably contain acoustic noise in real applications. It's still challenging to generating target voice with noisy data. In this paper, we study the data efficient voice cloning problem from noisy samples under the sequence-to-sequence based TTS paradigm. Specifically, we introduce domain ad
Authors
(none)
Tags
Stats
Related papers
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Neural Voice Cloning With A Few Samples (2018)0.00
- Spoken Language Corpora Augmentation With Domain-specific Voice-cloned Speech (2024)0.00
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Voice Cloning: A Multi-speaker Text-to-speech Synthesis Approach Based On Transfer Learning (2021)0.00
- Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data (2021)0.00
- Single And Multi-speaker Cloned Voice Detection: From Perceptual To Learned Features (2023)9.23
- Adversarial Speaker-consistency Learning Using Untranscribed Speech Data For Zero-shot Multi-speaker Text-to-speech (2022)4.52