Audio Dequantization For High Fidelity Audio Generation In Flow-based Neural Vocoder
2020 Β· Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, et al.
Abstract
In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality. From our experiments, using audio dequantization
Authors
(none)
Tags
Stats
Related papers
- Flowvocoder: A Small Footprint Neural Vocoder Based Normalizing Flow For Speech Synthesis (2021)0.00
- Flowdec: A Flow-based Full-band General Audio Codec With High Perceptual Quality (2025)0.00
- Flowavenet : A Generative Flow For Raw Audio (2018)0.00
- NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization (2024)0.00
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Flowmac: Conditional Flow Matching For Audio Coding At Low Bit Rates (2024)0.00
- Universr: Unified And Versatile Audio Super-resolution Via Vocoder-free Flow Matching (2025)0.00
- Flowhigh: Towards Efficient And High-quality Audio Super-resolution With Single-step Flow Matching (2025)5.84