On The Interplay Between Sparsity, Naturalness, Intelligibility, And Prosody In Speech Synthesis
2021 Β· Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, et al.
Abstract
Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explored several aspects of TTS pruning: amount of finetuning data versus sparsity, TTS-Augmentation to utilize unspoken text, and combining knowledge distillation and pruning. Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility, with similar prosody. All of our experiments are conducted on publicly available models, and findings in this work are backed by large-scale subjective tests and objective measures. Code and 200 pruned models are made available
Authors
(none)
Tags
Stats
Related papers
- EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-to-speech Models (2022)3.58
- SNIPER Training: Single-shot Sparse Training For Text-to-speech (2022)0.00
- Evaluating Text-to-speech Synthesis From A Large Discrete Token-based Speech Language Model (2024)0.00
- Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning (2023)6.34
- Applying Syntax\(\unicode{x2013}\)prosody Mapping Hypothesis And Prosodic Well-formedness Constraints To Neural Sequence-to-sequence Speech Synthesis (2022)0.00
- Spontaneous Style Text-to-speech Synthesis With Controllable Spontaneous Behaviors Based On Language Models (2024)7.81
- Controllable Neural Text-to-speech Synthesis Using Intuitive Prosodic Features (2020)11.76
- Efficient Neural Speech Synthesis For Low-resource Languages Through Multilingual Modeling (2020)8.60