JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models
2023 Β· Peike Li, Boyu Chen, Yao Yao, et al.
Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at https://jenmusic.ai/audio-demos
Authors
(none)
Tags
Stats
Related papers
- Mustango: Toward Controllable Text-to-music Generation (2023)11.67
- Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies (2023)13.55
- Quality-aware Masked Diffusion Transformer For Enhanced Music Generation (2024)5.60
- M\(^{2}\)ugen: Multi-modal Music Understanding And Generation With The Power Of Large Language Models (2023)0.00
- Samuel: Efficient Vocal-conditioned Music Generation Via Soft Alignment Attention And Latent Diffusion (2025)0.00
- Generalized Multi-source Inference For Text Conditioned Music Diffusion Models (2024)0.00
- Editing Music With Melody And Text: Using Controlnet For Diffusion Transformer (2024)5.84
- Diff-a-riff: Musical Accompaniment Co-creation Via Latent Diffusion Models (2024)0.00