NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization
2024 Β· Zhikang Niu, Sanyuan Chen, Long Zhou, et al.
Abstract
Built upon vector quantization (VQ), discrete audio codec models have achieved great success in audio compression and auto-regressive audio generation. However, existing models face substantial challenges in perceptual quality and signal distortion, especially when operating in extremely low bandwidth, rooted in the sensitivity of the VQ codebook to noise. This degradation poses significant challenges for several downstream tasks, such as codec-based speech synthesis. To address this issue, we propose a novel VQ method, Normal Distribution-based Vector Quantization (NDVQ), by introducing an explicit margin between the VQ codes via learning a variance. Specifically, our approach involves mapping the waveform to a latent space and quantizing it by selecting the most likely normal distribution, with each codebook entry representing a unique normal distribution defined by its mean and variance. Using these distribution-based VQ codec codes, a decoder reconstructs the input waveform. NDVQ i
Authors
(none)
Tags
Stats
Related papers
- Variable Bitrate Residual Vector Quantization For Audio Coding (2024)3.58
- ERVQ: Enhanced Residual Vector Quantization With Intra-and-inter-codebook Optimization For Neural Audio Codecs (2024)6.34
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Mdctcodec: A Lightweight Mdct-based Neural Audio Codec Towards High Sampling Rate And Low Bitrate Scenarios (2024)8.09
- Neural Speech Coding For Real-time Communications Using Constant Bitrate Scalar Quantization (2024)0.00
- Latent-domain Predictive Neural Speech Coding (2022)12.15
- Hifi-codec: Group-residual Vector Quantization For High Fidelity Audio Codec (2023)0.00