Hifi-codec: Group-residual Vector Quantization For High Fidelity Audio Codec
2023 Β· Dongchao Yang, Songxiang Liu, Rongjie Huang, et al.
Abstract
Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is an audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf\{Hi\}gh \textbf\{Fi\}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train al
Authors
(none)
Tags
Stats
Related papers
- Variable Bitrate Residual Vector Quantization For Audio Coding (2024)3.58
- ERVQ: Enhanced Residual Vector Quantization With Intra-and-inter-codebook Optimization For Neural Audio Codecs (2024)6.34
- Language-codec: Bridging Discrete Codec Representations And Speech Language Models (2024)4.64
- Spectral Codecs: Improving Non-autoregressive Speech Synthesis With Spectrogram-based Audio Codecs (2024)0.00
- NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization (2024)0.00
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- Freecodec: A Disentangled Neural Speech Codec With Fewer Tokens (2024)4.52
- Single-codec: Single-codebook Speech Codec Towards High-performance Speech Generation (2024)9.23