Scoredec: A Phase-preserving High-fidelity Audio Codec With A Generalized Score-based Diffusion Post-filter
2024 · Yi-Chiao Wu, Dejan Marković, Steven Krenn, et al.
Abstract
Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant. A generative adversarial network (GAN) training is usually required for these E2E neural codecs because of the difficulty of direct phase modeling. However, such adversarial learning hinders these codecs from preserving the original phase information. To achieve human-level naturalness with a reasonable bitrate, preserve the original phase, and get rid of the tricky and opaque GAN training, we develop a score-based diffusion post-filter (SPF) in the complex spectral domain and combine our previous AudioDec with the SPF to propose ScoreDec, which can be trained using only spectral and score-matching losses. Both the objective and subjective experimental results show that ScoreDec with a 24~kbps bitrate encodes and decodes full-band 48~kHz speech with human-level naturalnes
Authors
(none)
Tags
Stats
Related papers
- Flowdec: A Flow-based Full-band General Audio Codec With High Perceptual Quality (2025)0.00
- Complexdec: A Domain-robust High-fidelity Neural Audio Codec With Complex Spectrum Modeling (2025)3.58
- Edmsound: Spectrogram Based Diffusion Models For Efficient And High-quality Audio Synthesis (2023)0.00
- Apcodec+: A Spectrum-coding-based High-fidelity And High-compression-rate Neural Audio Codec With Staged Training Paradigm (2024)0.00
- Postgan: A Gan-based Post-processor To Enhance The Quality Of Coded Speech (2022)9.76
- Stftcodec: High-fidelity Audio Compression Through Time-frequency Domain Representation (2025)2.26
- Mdctcodec: A Lightweight Mdct-based Neural Audio Codec Towards High Sampling Rate And Low Bitrate Scenarios (2024)8.09
- Audio Generation Through Score-based Generative Modeling: Design Principles And Implementation (2025)1.91