Voice Cloning: A Multi-speaker Text-to-speech Synthesis Approach Based On Transfer Learning
2021 Β· Giuseppe Ruggiero, Enrico Zovato, Luigi di Caro, et al.
Abstract
Deep learning models are becoming predominant in many fields of machine learning. Text-to-Speech (TTS), the process of synthesizing artificial speech from text, is no exception. To this end, a deep neural network is usually trained using a corpus of several hours of recorded speech from a single speaker. Trying to produce the voice of a speaker other than the one learned is expensive and requires large effort since it is necessary to record a new dataset and retrain the model. This is the main reason why the TTS models are usually single speaker. The proposed approach has the goal to overcome these limitations trying to obtain a system which is able to model a multi-speaker acoustic space. This allows the generation of speech audio similar to the voice of different target speakers, even if they were not observed during the training phase.
Authors
(none)
Tags
Stats
Related papers
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Transfer Learning From Speaker Verification To Multispeaker Text-to-speech Synthesis (2018)0.00
- Neural Voice Cloning With A Few Samples (2018)0.00
- Deep Voice 2: Multi-speaker Neural Text-to-speech (2017)0.00
- Cross-lingual Multi-speaker Text-to-speech Synthesis For Voice Cloning Without Using Parallel Corpus For Unseen Speakers (2019)0.00
- Data Efficient Voice Cloning From Noisy Samples With Domain Adversarial Training (2020)9.92