Flexicodec: A Dynamic Neural Audio Codec For Low Frame Rates
2025 Β· Jiaqi Li, Yao Qian, Yuxuan Hu, et al.
Abstract
Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-rate audio codecs, but even lower frame rate codecs remain underexplored. We find that a major challenge for very low frame rate tokens is missing semantic information. This paper introduces FlexiCodec to address this limitation. FlexiCodec improves semantic preservation with a dynamic frame rate approach and introduces a novel architecture featuring an ASR feature-assisted dual stream encoding and Transformer bottlenecks. With dynamic frame rates, it uses less frames at information-sparse regions through adaptively merging semantically similar frames. A dynamic frame rate also allows FlexiCodec to support inference-time controllable frame rates between 3Hz and 12.5Hz. Ex
Authors
(none)
Tags
Stats
Related papers
- Codecslime: Temporal Redundancy Compression Of Neural Speech Codec Via Dynamic Frame Rate (2025)0.00
- Low Frame-rate Speech Codec: A Codec Designed For Fast High-quality Speech LLM Training And Inference (2024)5.24
- Freecodec: A Disentangled Neural Speech Codec With Fewer Tokens (2024)4.52
- Semanticodec: An Ultra Low Bitrate Semantic Audio Codec For General Sound (2024)10.97
- Phoenixcodec: Taming Neural Speech Coding For Extreme Low-resource Scenarios (2025)0.00
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Funcodec: A Fundamental, Reproducible And Integrable Open-source Toolkit For Neural Speech Codec (2023)17.47
- Latent-domain Predictive Neural Speech Coding (2022)12.15