Musetok: Symbolic Music Tokenization For Generation And Semantic Understanding
2025 Β· Jingyue Huang, Zachary Novack, Phillip Long, et al.
Abstract
Discrete representation learning has shown promising results across various domains, including generation and understanding in image, speech and language. Inspired by these advances, we propose MuseTok, a tokenization method for symbolic music, and investigate its effectiveness in both music generation and understanding tasks. MuseTok employs the residual vector quantized-variational autoencoder (RQ-VAE) on bar-wise music segments within a Transformer-based encoder-decoder framework, producing music codes that achieve high-fidelity music reconstruction and accurate understanding of music theory. For comprehensive evaluation, we apply MuseTok to music generation and semantic understanding tasks, including melody extraction, chord recognition, and emotion recognition. Models incorporating MuseTok outperform previous representation learning baselines in semantic understanding while maintaining comparable performance in content generation. Furthermore, qualitative analyses on MuseTok codes
Authors
(none)
Tags
Stats
Related papers
- Nested Music Transformer: Sequentially Decoding Compound Tokens In Symbolic Music And Audio Generation (2024)0.00
- Muq: Self-supervised Music Representation Learning With Mel Residual Vector Quantization (2025)15.66
- Fusing Memory And Attention: A Study On LSTM, Transformer And Hybrid Architectures For Symbolic Music Generation (2026)0.00
- Amadeus: Autoregressive Model With Bidirectional Attribute Modelling For Symbolic Music (2025)0.00
- M\(^{2}\)ugen: Multi-modal Music Understanding And Generation With The Power Of Large Language Models (2023)0.00
- Quality-aware Masked Diffusion Transformer For Enhanced Music Generation (2024)5.60
- Unified Cross-modal Translation Of Score Images, Symbolic Music, And Performance Audio (2025)0.00
- Vec-tok Speech: Speech Vectorization And Tokenization For Neural Speech Generation (2023)0.00