Latent-domain Predictive Neural Speech Coding
2022 Β· Xue Jiang, Xiulian Peng, Huaying Xue, et al.
Abstract
Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a convolutional neural network for encoding, by which there are still temporal redundancies within encoded features. This paper introduces latent-domain predictive coding into the VQ-VAE framework to fully remove such redundancies and proposes the TF-Codec for low-latency neural speech coding in an end-to-end manner. Specifically, the extracted features are encoded conditioned on a prediction from past quantized latent frames so that temporal correlations are further removed. Moreover, we introduce a learnable compression on the time-frequency input to adaptively adjust the attention paid to main frequencies and details at different bitrates. A differentiable vector quantization scheme based on distance-to-soft mapping and Gumbel-Softmax is
Authors
(none)
Tags
Stats
Related papers
- Neural Feature Predictor And Discriminative Residual Coding For Low-bitrate Speech Coding (2022)6.77
- Disentangled Feature Learning For Real-time Neural Speech Coding (2022)0.00
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)14.80
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- ESC: Efficient Speech Coding With Cross-scale Residual Vector Quantized Transformers (2024)5.84
- Freecodec: A Disentangled Neural Speech Codec With Fewer Tokens (2024)4.52
- Neural Speech Coding For Real-time Communications Using Constant Bitrate Scalar Quantization (2024)0.00