Mntts2: An Open-source Multi-speaker Mongolian Text-to-speech Synthesis Dataset
2022 Β· Kailin Liang, Bin Liu, Yifan Hu, et al.
Abstract
Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the constructed MnTTS2 dataset is sufficient to build robust multi-speaker TTS models for real-world applic
Authors
(none)
Tags
Stats
Related papers
- Mntts: An Open-source Mongolian Text-to-speech Synthesis Dataset And Accompanied Baseline (2022)5.24
- Low-resource Mongolian Speech Synthesis Based On Automatic Prosody Annotation (2022)0.00
- TMD-TTS: A Unified Tibetan Multi-dialect Text-to-speech Framework For \"u-tsang, Amdo And Kham Speech Dataset Generation (2025)0.00
- Mscenespeech: A Multi-scene Speech Dataset For Expressive Speech Synthesis (2024)0.00
- EM-TTS: Efficiently Trained Low-resource Mongolian Lightweight Text-to-speech (2024)0.00
- Mparrottts: Multilingual Multi-speaker Text To Speech Synthesis In Low Resource Setting (2023)0.00
- Synth2aug: Cross-domain Speaker Recognition With TTS Synthesized Speech (2020)6.77
- Wenetspeech4tts: A 12,800-hour Mandarin TTS Corpus For Large Speech Generation Model Benchmark (2024)9.76