ERVQ: Enhanced Residual Vector Quantization With Intra-and-inter-codebook Optimization For Neural Audio Codecs
2024 Β· Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, et al.
Abstract
Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To address this problem, we introduce ERVQ, Enhanced Residual Vector Quantization, a novel enhancement strategy for the RVQ framework in neural audio codecs. ERVQ mitigates codebook collapse and boosts codec performance through both intra- and inter-codebook optimization. Intra-codebook optimization incorporates an online clustering strategy and a code balancing loss to ensure balanced and efficient codebook utilization. Inter-codebook optimization improves the diversity of quantized features by minimizing the similarity between successive quantizations. Our experiments show that ERVQ significantly enhances audio codec performance across different models, sampling rates, and bitrates, achieving superior quality and generalization capabilities. It also ac
Authors
(none)
Tags
Stats
Related papers
- Variable Bitrate Residual Vector Quantization For Audio Coding (2024)3.58
- NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization (2024)0.00
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- ESC: Efficient Speech Coding With Cross-scale Residual Vector Quantized Transformers (2024)5.84
- Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization (2020)8.60
- Hifi-codec: Group-residual Vector Quantization For High Fidelity Audio Codec (2023)0.00
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00