Almtokenizer: A Low-bitrate And Semantic-rich Audio Codec Tokenizer For Audio Language Modeling
2025 Β· Dongchao Yang, Songxiang Liu, Haohan Guo, et al.
Abstract
Recent advancements in audio language models have underscored the pivotal role of audio tokenization, which converts audio signals into discrete tokens, thereby facilitating the application of language model architectures to the audio domain. In this study, we introduce ALMTokenizer, a novel low-bitrate and semantically rich audio codec tokenizer for audio language models. Prior methods, such as Encodec, typically encode individual audio frames into discrete tokens without considering the use of context information across frames. Unlike these methods, we introduce a novel query-based compression strategy to capture holistic information with a set of learnable query tokens by explicitly modeling the context information across frames. This design not only enables the codec model to capture more semantic information but also encodes the audio signal with fewer token sequences. Additionally, to enhance the semantic information in audio codec models, we introduce the following: (1) A masked
Authors
(none)
Tags
Stats
Related papers
- Wavtokenizer: An Efficient Acoustic Discrete Codec Tokenizer For Audio Language Modeling (2024)6.22
- Semanticodec: An Ultra Low Bitrate Semantic Audio Codec For General Sound (2024)10.97
- Audiolm: A Language Modeling Approach To Audio Generation (2022)18.91
- Continuous Audio Language Models (2025)0.00
- Codec Does Matter: Exploring The Semantic Shortcoming Of Codec For Audio Language Model (2024)15.02
- Dm-codec: Distilling Multimodal Representations For Speech Tokenization (2024)3.53
- Discrete Audio Representation As An Alternative To Mel-spectrograms For Speaker And Speech Recognition (2023)8.60
- Uniaudio 1.5: Large Language Model-driven Audio Codec Is A Few-shot Audio Task Learner (2024)0.00