On The Relation Between Speech Quality And Quantized Latent Representations Of Neural Codecs
2025 Β· Mhd Modar Halimeh, Matteo Torcoli, Philipp Grundhuber, et al.
Abstract
Neural audio signal codecs have attracted significant attention in recent years. In essence, the impressive low bitrate achieved by such encoders is enabled by learning an abstract representation that captures the properties of encoded signals, e.g., speech. In this work, we investigate the relation between the latent representation of the input signal learned by a neural codec and the quality of speech signals. To do so, we introduce Latent-representation-to-Quantization error Ratio (LQR) measures, which quantify the distance from the idealized neural codec's speech signal model for a given speech signal. We compare the proposed metrics to intrusive measures as well as data-driven supervised methods using two subjective speech quality datasets. This analysis shows that the proposed LQR correlates strongly (up to 0.9 Pearson's correlation) with the subjective quality of speech. Despite being a non-intrusive metric, this yields a competitive performance with, or even better than, other
Authors
(none)
Tags
Stats
Related papers
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Latent-domain Predictive Neural Speech Coding (2022)12.15
- Neural Speech Coding For Real-time Communications Using Constant Bitrate Scalar Quantization (2024)0.00
- Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization (2020)8.60
- Speech Quality Factors For Traditional And Neural-based Low Bit Rate Vocoders (2020)7.16
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Spectral Codecs: Improving Non-autoregressive Speech Synthesis With Spectrogram-based Audio Codecs (2024)0.00
- ERVQ: Enhanced Residual Vector Quantization With Intra-and-inter-codebook Optimization For Neural Audio Codecs (2024)6.34