Efficient Neural Speech Synthesis For Low-resource Languages Through Multilingual Modeling
2020 Β· Marcel de Korte, Jaebok Kim, Esther Klabbers
Abstract
Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languages for which abundant multi-speaker data is not available. In this paper, we therefore investigated to what extent multilingual multi-speaker modeling can be an alternative to monolingual multi-speaker modeling, and explored how data from foreign languages may best be combined with low-resource language data. We found that multilingual modeling can increase the naturalness of low-resource language speech, showed that multilingual models can produce speech with a naturalness comparable to monolingual multi-speaker models, and saw that the target language naturalness was affected by the strategy
Authors
(none)
Tags
Stats
Related papers
- Combining Speakers Of Multiple Languages To Improve Quality Of Neural Voices (2021)5.24
- Multilingual Byte2speech Models For Scalable Low-resource Speech Synthesis (2021)0.00
- Modeling Multi-speaker Latent Space To Improve Neural TTS: Quick Enrolling New Speaker And Enhancing Premium Voice (2018)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03
- Extending Multilingual Speech Synthesis To 100+ Languages Without Transcribed Data (2024)7.16
- Towards High-quality Neural TTS For Low-resource Languages By Learning Compact Speech Representations (2022)0.00
- Building A Mixed-lingual Neural TTS System With Only Monolingual Data (2019)0.00