Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis
2025 Β· Yizhong Geng, Jizhuo Xu, Zeyu Liang, et al.
Abstract
Text-to-speech (TTS) technology has achieved impressive results for widely spoken languages, yet many under-resourced languages remain challenged by limited data and linguistic complexities. In this paper, we present a novel methodology that integrates a data-optimized framework with an advanced acoustic model to build high-quality TTS systems for low-resource scenarios. We demonstrate the effectiveness of our approach using Thai as an illustrative case, where intricate phonetic rules and sparse resources are effectively addressed. Our method enables zero-shot voice cloning and improved performance across diverse client applications, ranging from finance to healthcare, education, and law. Extensive evaluations - both subjective and objective - confirm that our model meets state-of-the-art standards, offering a scalable solution for TTS production in data-limited settings, with significant implications for broader industry adoption and multilingual accessibility.
Authors
(none)
Tags
Stats
Related papers
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Efficient Neural Speech Synthesis For Low-resource Languages Through Multilingual Modeling (2020)8.60
- Towards Building Text-to-speech Systems For The Next Billion Users (2022)0.00
- ELAICHI: Enhancing Low-resource TTS By Addressing Infrequent And Low-frequency Character Bigrams (2024)0.00
- HAM-TTS: Hierarchical Acoustic Modeling For Token-based Zero-shot Text-to-speech With Model And Data Scaling (2024)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Low-data? No Problem: Low-resource, Language-agnostic Conversational Text-to-speech Via F0-conditioned Data Augmentation (2022)0.00
- An Automated End-to-end Open-source Software For High-quality Text-to-speech Dataset Generation (2024)0.00