Diffrhythm+: Controllable And Flexible Full-length Song Generation With Preference Optimization
2025 Β· Huakang Chen, Yuepeng Jiang, Guobin Ma, et al.
Abstract
Songs, as a central form of musical art, exemplify the richness of human intelligence and creativity. While recent advances in generative modeling have enabled notable progress in long-form song generation, current systems for full-length song synthesis still face major challenges, including data imbalance, insufficient controllability, and inconsistent musical quality. DiffRhythm, a pioneering diffusion-based model, advanced the field by generating full-length songs with expressive vocals and accompaniment. However, its performance was constrained by an unbalanced model training dataset and limited controllability over musical style, resulting in noticeable quality disparities and restricted creative flexibility. To address these limitations, we propose DiffRhythm+, an enhanced diffusion-based framework for controllable and flexible full-length song generation. DiffRhythm+ leverages a substantially expanded and balanced training dataset to mitigate issues such as repetition and omissi
Authors
(none)
Tags
Stats
Related papers
- Diffrhythm: Blazingly Fast And Embarrassingly Simple End-to-end Full-length Song Generation With Latent Diffusion (2025)0.00
- Diffrhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching (2025)0.00
- Diff-a-riff: Musical Accompaniment Co-creation Via Latent Diffusion Models (2024)0.00
- Motionrag-diff: A Retrieval-augmented Diffusion Framework For Long-term Music-to-dance Generation (2025)0.00
- Conditional Diffusion As Latent Constraints For Controllable Symbolic Music Generation (2025)0.00
- Segtune: Structured And Fine-grained Control For Song Generation (2025)0.00
- Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies (2023)13.55
- Samuel: Efficient Vocal-conditioned Music Generation Via Soft Alignment Attention And Latent Diffusion (2025)0.00