Flowavenet : A Generative Flow For Raw Audio
2018 Β· Sungwon Kim, Sang-Gil Lee, Jongyoon Song, et al.
Abstract
Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application due to its ancestral sampling scheme. The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel sampling. However, these approaches require a two-stage training pipeline with a well-trained teacher network and can only produce natural sound by using probability distillation along with auxiliary loss terms. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative flow. The model can efficiently sample raw audio in real-time, with clarity comparable to previous tw
Authors
(none)
Tags
Stats
Related papers
- Wavenet: A Generative Model For Raw Audio (2016)0.00
- Flowvocoder: A Small Footprint Neural Vocoder Based Normalizing Flow For Speech Synthesis (2021)0.00
- Waveglow: A Flow-based Generative Network For Speech Synthesis (2018)20.65
- Flowtron: An Autoregressive Flow-based Generative Network For Text-to-speech Synthesis (2020)5.91
- Blow: A Single-scale Hyperconditioned Flow For Non-parallel Raw-audio Voice Conversion (2019)0.00
- Parallel Wavenet: Fast High-fidelity Speech Synthesis (2017)0.00
- Audio Dequantization For High Fidelity Audio Generation In Flow-based Neural Vocoder (2020)6.77
- Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-resolution Spectrogram (2019)0.00