VANI: Very-lightweight Accent-controllable TTS For Native And Non-native Speakers With Identity Preservation

Abstract

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained \(F_0\) and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track \(1\) and lightweight VANI model for Track \(2\) and \(3\) of the competition.

VANI: Very-lightweight Accent-controllable TTS For Native And Non-native Speakers With Identity Preservation

Abstract

Authors

Tags

Stats

Related papers