Code Drift: Towards Idempotent Neural Audio Codecs
2024 Β· Patrick O'Reilly, Prem Seetharaman, Jiaqi Su, et al.
Abstract
Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates. The token-based representations produced by these codecs have proven particularly useful for generative modeling. While much research has focused on improvements in compression ratio and perceptual transparency, recent works have largely overlooked another desirable codec property -- idempotence, the stability of compressed outputs under multiple rounds of encoding. We find that state-of-the-art neural codecs exhibit varied degrees of idempotence, with some degrading audio outputs significantly after as few as three encodings. We investigate possible causes of low idempotence and devise a method for improving idempotence through fine-tuning a codec model. We then examine the effect of idempotence on a simple conditional generative modeling task, and find that increased idempotence can be achieved without negatively impacting downstream modeling performance -- potentially ex
Authors
(none)
Tags
Stats
Related papers
- Freecodec: A Disentangled Neural Speech Codec With Fewer Tokens (2024)4.52
- Neural Speech And Audio Coding: Modern AI Technology Meets Traditional Codecs (2024)7.16
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Complexdec: A Domain-robust High-fidelity Neural Audio Codec With Complex Spectrum Modeling (2025)3.58
- Flexicodec: A Dynamic Neural Audio Codec For Low Frame Rates (2025)3.38
- Modeling Strategies For Speech Enhancement In The Latent Space Of A Neural Audio Codec (2025)0.00
- Towards Evaluating Generative Audio: Insights From Neural Audio Codec Embedding Distances (2025)0.00
- Analyzing And Mitigating Inconsistency In Discrete Audio Tokens For Neural Codec Language Models (2024)5.84