Squeezewave: Extremely Lightweight Vocoders For On-device Speech Synthesis
2020 Β· Bohan Zhai, Tianren Gao, Flora Xue, et al.
Abstract
Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.
Authors
(none)
Tags
Stats
Code
Related papers
- Fbwave: Efficient And Scalable Neural Vocoders For Streaming Text-to-speech On The Edge (2020)0.00
- Waveglow: A Flow-based Generative Network For Speech Synthesis (2018)20.65
- Puffin: Pitch-synchronous Neural Waveform Generation For Fullband Speech On Modest Devices (2022)3.58
- Glow-wavegan 2: High-quality Zero-shot Text-to-speech Synthesis And Any-to-any Voice Conversion (2022)7.50
- Flowvocoder: A Small Footprint Neural Vocoder Based Normalizing Flow For Speech Synthesis (2021)0.00
- Featherwave: An Efficient High-fidelity Neural Vocoder With Multi-band Linear Prediction (2020)8.35
- Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-resolution Spectrogram (2019)0.00
- Framewise Wavegan: High Speed Adversarial Vocoder In Time Domain With Very Low Computational Complexity (2022)7.16