Parrottts: Text-to-speech Synthesis By Exploiting Self-supervised Representations
2023 Β· Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, et al.
Abstract
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual or parallel examples, ParrotTTS can transfer voices across languages while preserving the speaker specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results in monolingual and multi-lingual scenarios. ParrotTTS outperforms state-of-the-art multi-lingual TTS models using only a fraction of paired data as latter.
Authors
(none)
Tags
Stats
Related papers
- Mparrottts: Multilingual Multi-speaker Text To Speech Synthesis In Low Resource Setting (2023)0.00
- Semi-supervised Learning For Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation (2020)5.24
- ZMM-TTS: Zero-shot Multilingual And Multispeaker Speech Synthesis Conditioned On Self-supervised Discrete Speech Representations (2023)10.35
- Speak, Read And Prompt: High-fidelity Text-to-speech With Minimal Supervision (2023)0.00
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Extending Multilingual Speech Synthesis To 100+ Languages Without Transcribed Data (2024)7.16
- Maximizing Data Efficiency For Cross-lingual TTS Adaptation By Self-supervised Representation Mixing And Embedding Initialization (2024)0.00