SNIPER Training: Single-shot Sparse Training For Text-to-speech
2022 Β· Perry Lam, Huayun Zhang, Nancy F. Chen, et al.
Abstract
Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that we were able to obtain better losses in the first few training epochs with SNIPER, and that the final SNIPER-trained models outperformed constant-sparsity models and edged out dense models, with negligi
Authors
(none)
Tags
Stats
Related papers
- EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-to-speech Models (2022)3.58
- On The Interplay Between Sparsity, Naturalness, Intelligibility, And Prosody In Speech Synthesis (2021)5.24
- Dynamic Sparsity Neural Networks For Automatic Speech Recognition (2020)0.00
- Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning (2023)6.34
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- SPADE: Structured Pruning And Adaptive Distillation For Efficient LLM-TTS (2025)0.00
- Speak, Read And Prompt: High-fidelity Text-to-speech With Minimal Supervision (2023)0.00
- HAM-TTS: Hierarchical Acoustic Modeling For Token-based Zero-shot Text-to-speech With Model And Data Scaling (2024)0.00