VANI: Very-lightweight Accent-controllable TTS For Native And Non-native Speakers With Identity Preservation
2023 Β· Rohan Badlani, Akshit Arora, Subhankar Ghosh, et al.
Abstract
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained \(F_0\) and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track \(1\) and lightweight VANI model for Track \(2\) and \(3\) of the competition.
Authors
(none)
Tags
Stats
Related papers
- DART: Disentanglement Of Accent And Speaker Representation In Multispeaker Text-to-speech (2024)0.00
- Accent-vits:accent Transfer For End-to-end TTS (2023)5.84
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Accented Text-to-speech Synthesis With A Conditional Variational Autoencoder (2022)0.00
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- VECL-TTS: Voice Identity And Emotional Style Controllable Cross-lingual Text-to-speech (2024)0.00
- Scaling Nvidia's Multi-speaker Multi-lingual TTS Systems With Zero-shot TTS To Indic Languages (2024)0.00
- Cross-dialect Text-to-speech In Pitch-accent Language Incorporating Multi-dialect Phoneme-level BERT (2024)3.58