Enhancing Out-of-vocabulary Performance Of Indian TTS Systems For Practical Applications Through Low-effort Data Strategies
2024 Β· Srija Anand, Praveen Srinivasa Varadhan, Ashwin Sankar, et al.
Abstract
Publicly available TTS datasets for low-resource languages like Hindi and Tamil typically contain 10-20 hours of data, leading to poor vocabulary coverage. This limitation becomes evident in downstream applications where domain-specific vocabulary coupled with frequent code-mixing with English, results in many OOV words. To highlight this problem, we create a benchmark containing OOV words from several real-world applications. Indeed, state-of-the-art Hindi and Tamil TTS systems perform poorly on this OOV benchmark, as indicated by intelligibility tests. To improve the model's OOV performance, we propose a low-effort and economically viable strategy to obtain more training data. Specifically, we propose using volunteers as opposed to high quality voice artists to record words containing character bigrams unseen in the training data. We show that using such inexpensive data, the model's performance improves on OOV words, while not affecting voice quality and in-domain performance.
Authors
(none)
Tags
Stats
Related papers
- Towards Building Text-to-speech Systems For The Next Billion Users (2022)0.00
- A Unified Framework For Collecting Text-to-speech Synthesis Datasets For 22 Indian Languages (2024)0.00
- ELAICHI: Enhancing Low-resource TTS By Addressing Infrequent And Low-frequency Character Bigrams (2024)0.00
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Indicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTS (2024)2.26
- Custom Data Augmentation For Low Resource ASR Using Bark And Retrieval-based Voice Conversion (2023)0.00
- Using Synthetic Audio To Improve The Recognition Of Out-of-vocabulary Words In End-to-end ASR Systems (2020)12.33
- Enhancing Out-of-domain Utterance Detection With Data Augmentation Based On Word Embeddings (2019)0.00