ELAICHI: Enhancing Low-resource TTS By Addressing Infrequent And Low-frequency Character Bigrams
2024 Β· Srija Anand, Praveen Srinivasa Varadhan, Mehak Singal, et al.
Abstract
Recent advancements in Text-to-Speech (TTS) technology have led to natural-sounding speech for English, primarily due to the availability of large-scale, high-quality web data. However, many other languages lack access to such resources, relying instead on limited studio-quality data. This scarcity results in synthesized speech that often suffers from intelligibility issues, particularly with low-frequency character bigrams. In this paper, we propose three solutions to address this challenge. First, we leverage high-quality data from linguistically or geographically related languages to improve TTS for the target language. Second, we utilize low-quality Automatic Speech Recognition (ASR) data recorded in non-studio environments, which is refined using denoising and speech enhancement models. Third, we apply knowledge distillation from large-scale models using synthetic data to generate more robust outputs. Our experiments with Hindi demonstrate significant reductions in intelligibility
Authors
(none)
Tags
Stats
Related papers
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Enhancing Out-of-vocabulary Performance Of Indian TTS Systems For Practical Applications Through Low-effort Data Strategies (2024)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Indicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTS (2024)2.26
- Audio-based Linguistic Feature Extraction For Enhancing Multi-lingual And Low-resource Text-to-speech (2024)0.00
- Towards Building Text-to-speech Systems For The Next Billion Users (2022)0.00
- Efficient Neural Speech Synthesis For Low-resource Languages Through Multilingual Modeling (2020)8.60