Mobilespeech: A Fast And High-fidelity Framework For Mobile Zero-shot Text-to-speech
2024 Β· Shengpeng Ji, Ziyue Jiang, Hanting Wang, et al.
Abstract
Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, model size, and robustness. Therefore, we propose MobileSpeech, which is a fast, lightweight, and robust zero-shot text-to-speech system based on mobile devices for the first time. Specifically: 1) leveraging discrete codec, we design a parallel speech mask decoder module called SMD, which incorporates hierarchical information from the speech codec and weight mechanisms across different codec layers during the generation process. Moreover, to bridge the gap between text and speech, we introduce a high-level probabilistic mask that simulates the progression of information flow from less to more du
Authors
(none)
Tags
Stats
Related papers
- Livespeech: Low-latency Zero-shot Text-to-speech Via Autoregressive Modeling Of Audio Discrete Codes (2024)5.84
- Maskgct: Zero-shot Text-to-speech With Masked Generative Codec Transformer (2024)7.98
- Controlspeech: Towards Simultaneous And Independent Zero-shot Speaker Cloning And Zero-shot Language Style Control (2024)9.40
- Syncspeech: Efficient And Low-latency Text-to-speech Based On Temporal Masked Transformer (2025)0.00
- Devicetts: A Small-footprint, Fast, Stable Network For On-device Text-to-speech (2020)0.00
- High Fidelity Text-to-speech Via Discrete Tokens Using Token Transducer And Group Masked Language Model (2024)4.52
- ZMM-TTS: Zero-shot Multilingual And Multispeaker Speech Synthesis Conditioned On Self-supervised Discrete Speech Representations (2023)10.35
- Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning (2023)6.34