Using Ipa-based Tacotron For Data Efficient Cross-lingual Speaker Adaptation And Pronunciation Enhancement
2020 Β· Hamed Hemati, Damian Borth
Abstract
Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.
Authors
(none)
Tags
Stats
Related papers
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Training Text-to-speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks (2022)7.16
- Towards End-to-end Prosody Transfer For Expressive Speech Synthesis With Tacotron (2018)0.00
- End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019)0.00
- Leveraging Parameter-efficient Transfer Learning For Multi-lingual Text-to-speech Adaptation (2024)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03