Neural Speech Coding For Real-time Communications Using Constant Bitrate Scalar Quantization
2024 Β· Andreas Brendel, Nicola Pia, Kishan Gupta, et al.
Abstract
Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook st
Authors
(none)
Tags
Stats
Related papers
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Variable Bitrate Residual Vector Quantization For Audio Coding (2024)3.58
- Latent-domain Predictive Neural Speech Coding (2022)12.15
- NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization (2024)0.00
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization (2020)8.60
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- On The Relation Between Speech Quality And Quantized Latent Representations Of Neural Codecs (2025)0.00