Indicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTS
2024 Β· Ashwin Sankar, Srija Anand, Praveen Srinivasa Varadhan, et al.
Abstract
Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations collected in low-quality environments to generate high-quality TTS training data. Our pipeline leverages the cross-lingual generalization of denoising and speech enhancement models trained on English and applied to Indian languages. This results in IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset derived from an ASR dataset, with 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages. IV-R matches the quality of gold-standard TTS datasets like LJSpeech, LibriTTS, and IndicTTS. We also introduce the IV-R Benchmark, the first to assess zero-shot, few-
Authors
(none)
Tags
Stats
Related papers
- A Unified Framework For Collecting Text-to-speech Synthesis Datasets For 22 Indian Languages (2024)0.00
- Towards Building Text-to-speech Systems For The Next Billion Users (2022)0.00
- Generic Indic Text-to-speech Synthesisers With Rapid Adaptation In An End-to-end Framework (2020)8.82
- ELAICHI: Enhancing Low-resource TTS By Addressing Infrequent And Low-frequency Character Bigrams (2024)0.00
- Enhancing Out-of-vocabulary Performance Of Indian TTS Systems For Practical Applications Through Low-effort Data Strategies (2024)0.00
- Praxy Voice: Voice-prompt Recovery + BUPS For Commercial-class Indic TTS From A Frozen Non-indic Base At Zero Commercial-training-data Cost (2026)0.00
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Scaling Nvidia's Multi-speaker Multi-lingual TTS Systems With Zero-shot TTS To Indic Languages (2024)0.00