Diffrhythm: Blazingly Fast And Embarrassingly Simple End-to-end Full-length Song Generation With Latent Diffusion
2025 Β· Ziqian Ning, Huakang Chen, Yuepeng Jiang, et al.
Abstract
Recent advancements in music generation have garnered significant attention, yet existing approaches face critical limitations. Some current generative models can only synthesize either the vocal track or the accompaniment track. While some models can generate combined vocal and accompaniment, they typically rely on meticulously designed multi-stage cascading architectures and intricate data pipelines, hindering scalability. Additionally, most systems are restricted to generating short musical segments rather than full-length songs. Furthermore, widely used language model-based methods suffer from slow inference speeds. To address these challenges, we propose DiffRhythm, the first latent diffusion-based song generation model capable of synthesizing complete songs with both vocal and accompaniment for durations of up to 4m45s in only ten seconds, maintaining high musicality and intelligibility. Despite its remarkable capabilities, DiffRhythm is designed to be simple and elegant: it elim
Authors
(none)
Tags
Stats
Related papers
- Diffrhythm+: Controllable And Flexible Full-length Song Generation With Preference Optimization (2025)3.58
- Diffrhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching (2025)0.00
- Diff-a-riff: Musical Accompaniment Co-creation Via Latent Diffusion Models (2024)0.00
- Samuel: Efficient Vocal-conditioned Music Generation Via Soft Alignment Attention And Latent Diffusion (2025)0.00
- Motionrag-diff: A Retrieval-augmented Diffusion Framework For Long-term Music-to-dance Generation (2025)0.00
- Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies (2023)13.55
- Conditional Diffusion As Latent Constraints For Controllable Symbolic Music Generation (2025)0.00
- Edmsound: Spectrogram Based Diffusion Models For Efficient And High-quality Audio Synthesis (2023)0.00