Mdctcodec: A Lightweight Mdct-based Neural Audio Codec Towards High Sampling Rate And Low Bitrate Scenarios
2024 Β· Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, et al.
Abstract
In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which is then discretized by a residual vector quantizer (RVQ). Subsequently, the decoder decodes the MDCT spectrum from the quantized latent code and reconstructs audio via inverse MDCT. During the training phase, a novel multi-resolution MDCT-based discriminator (MR-MDCTD) is adopted to discriminate the natural or decoded MDCT spectrum for adversarial training. Experimental results confirm that, in scenarios with high sampling rates and low bitrates, the MDCTCodec exhibited high decoded audio quality, improved training and generation efficiency, and compact model size compared to baseline codecs. Specifically, the MDCTCodec achieved a ViSQOL score of 4.18 at a sampling rate of 48 kHz and a bitrate of 6 kbps on the public VCTK corpus.
Authors
(none)
Tags
Stats
Related papers
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- A DNN Based Post-filter To Enhance The Quality Of Coded Speech In MDCT Domain (2022)6.34
- Complexdec: A Domain-robust High-fidelity Neural Audio Codec With Complex Spectrum Modeling (2025)3.58
- NDVQ: Robust Neural Audio Codec With Normal Distribution-based Vector Quantization (2024)0.00
- Latent-domain Predictive Neural Speech Coding (2022)12.15
- Msr-codec: A Low-bitrate Multi-stream Residual Codec For High-fidelity Speech Generation With Information Disentanglement (2025)2.35
- Variable Bitrate Residual Vector Quantization For Audio Coding (2024)3.58
- Stftcodec: High-fidelity Audio Compression Through Time-frequency Domain Representation (2025)2.26