Praxy Voice: Voice-prompt Recovery + BUPS For Commercial-class Indic TTS From A Frozen Non-indic Base At Zero Commercial-training-data Cost

Abstract

arXiv:2604.25441v1 Announce Type: new Abstract: Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured phonological dimensions, and the most widely adopted multilingual base (Chatterbox, 23 languages) does not even tokenise Telugu or Tamil. We ask: what is the minimum intervention that brings such a non-Indic-native base to commercial-class output on Telugu, Tamil, and Hindi, without training a new acoustic decoder and without any commercial TTS training data? We combine three pieces: (1) BUPS, a Brahmic Unified Phoneme Space that deterministically romanises seven Indic scripts to ISO-15919 so Chatterbox's Latin tokeniser can process them; (2) a LoRA adapter on only the text-token predictor (Chatterbox's t3), trained on ~1,220h of licensed Indic audio with a Hindi-proxy language_id; (3) a voice-prompt recovery recipe -- an 8-11s same-language reference clip plus three sampling overrides (exa

Praxy Voice: Voice-prompt Recovery + BUPS For Commercial-class Indic TTS From A Frozen Non-indic Base At Zero Commercial-training-data Cost

Abstract

Authors

Tags

Stats

Related papers