Cascaded Cross-module Residual Learning Towards Lightweight End-to-end Speech Coding
2019 Β· Kai Zhen, Jongmo Sung, Mi Suk Lee, et al.
Abstract
Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme. The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNN-based speech codec with a similar network architecture. As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved by using onl
Authors
(none)
Tags
Stats
Related papers
- Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization (2020)8.60
- Msr-codec: A Low-bitrate Multi-stream Residual Codec For High-fidelity Speech Generation With Information Disentanglement (2025)2.35
- Neural Feature Predictor And Discriminative Residual Coding For Low-bitrate Speech Coding (2022)6.77
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Speaking From Coarse To Fine: Improving Neural Codec Language Model Via Multi-scale Speech Coding And Generation (2024)3.58
- ESC: Efficient Speech Coding With Cross-scale Residual Vector Quantized Transformers (2024)5.84
- Language-codec: Bridging Discrete Codec Representations And Speech Language Models (2024)4.64
- Composition Of Deep And Spiking Neural Networks For Very Low Bit Rate Speech Coding (2016)9.92