Fastsag: Towards Fast Non-autoregressive Singing Accompaniment Generation
2024 Β· Jianyi Chen, Wei Xue, Xu Tan, et al.
Abstract
Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to developing human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-time applications. In this paper, we aim to develop a Fast SAG method that can create high-quality and coherent accompaniments. A non-AR diffusion-based framework is developed, which by carefully designing the conditions inferred from the vocal signals, generates the Mel spectrogram of the target accompaniment directly. With diffusion and Mel spectrogram modeling, the proposed method significantly simplifies the AR token-based SingSong framework, and largely accelerates the generation. We also design semantic projection, prior projection blocks as well as a set of loss functions, to ensure the
Authors
(none)
Tags
Stats
Related papers
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)23.76
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Text-to-song: Towards Controllable Music Generation Incorporating Vocals And Accompaniment (2024)0.00
- Consinger: Efficient High-fidelity Singing Voice Generation With Minimal Steps (2024)2.26
- Songgen: A Single Stage Auto-regressive Transformer For Text-to-song Generation (2025)4.98
- Xiaoicesing 2: A High-fidelity Singing Voice Synthesizer Based On Generative Adversarial Network (2022)0.00
- Singing Voice Synthesis Using Deep Autoregressive Neural Networks For Acoustic Modeling (2019)9.92
- Makesinger: A Semi-supervised Training Method For Data-efficient Singing Voice Synthesis Via Classifier-free Diffusion Guidance (2024)4.52