Flowvocoder: A Small Footprint Neural Vocoder Based Normalizing Flow For Speech Synthesis
2021 Β· Manh Luong, Viet Anh Tran
Abstract
Recently, autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, autoregressive neural vocoders such as WaveFlow are capable of modeling waveform signals from mel-spectrogram, its number of parameters is significant to deploy on edge devices. Though NanoFlow, which has a small number of parameters, is a state-of-the-art autoregressive neural vocoder, the performance of NanoFlow is marginally lower than WaveFlow. Therefore, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is capable of generating high-fidelity audio in real-time. Our proposed model improves the density estimation of flow blocks by utilizing a mixture of Cumulative Distribution Functions (CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals, while its memory footprint is much smaller than WaveF
Authors
(none)
Tags
Stats
Related papers
- Audio Dequantization For High Fidelity Audio Generation In Flow-based Neural Vocoder (2020)6.77
- Flowavenet : A Generative Flow For Raw Audio (2018)0.00
- Voiceflow: Efficient Text-to-speech With Rectified Flow Matching (2023)0.00
- Fbwave: Efficient And Scalable Neural Vocoders For Streaming Text-to-speech On The Edge (2020)0.00
- Text-free Non-parallel Many-to-many Voice Conversion Using Normalising Flows (2022)7.16
- Flowtron: An Autoregressive Flow-based Generative Network For Text-to-speech Synthesis (2020)5.91
- V2sflow: Video-to-speech Generation With Speech Decomposition And Rectified Flow (2024)8.52
- Improving The Expressiveness Of Neural Vocoding With Non-affine Normalizing Flows (2021)0.00