Stylus: Repurposing Stable Diffusion For Training-free Music Style Transfer On Mel-spectrograms
2024 Β· Heehwan Wang, Joonwoo Kwon, Sooyoung Kim, et al.
Abstract
Music style transfer enables personalized music creation by blending the structure of a source with the stylistic attributes of a reference. Existing text-conditioned and diffusion-based approaches show promise but often require paired datasets, extensive training, or detailed annotations. We present Stylus, a training-free framework that repurposes a pre-trained Stable Diffusion model for music style transfer in the mel-spectrogram domain. Stylus manipulates self-attention by injecting style key-value features while preserving source queries to maintain musical structure. To improve fidelity, we introduce a phase-preserving reconstruction strategy that avoids artifacts from Griffin-Lim reconstruction, and we adopt classifier-free-guidance-inspired control for adjustable stylization and multi-style blending. In extensive evaluations, Stylus outperforms state-of-the-art baselines, achieving 34.1% higher content preservation and 25.7% better perceptual quality without any additional trai
Authors
(none)
Tags
Stats
Related papers
- Play As You Like: Timbre-enhanced Multi-modal Music Style Transfer (2018)9.92
- Time Domain Neural Audio Style Transfer (2017)0.00
- Interpretable Style Transfer For Text-to-speech With Controlvae And Diffusion Bridge (2023)5.24
- Fine-grained Style Modeling, Transfer And Prediction In Text-to-speech Synthesis Via Phone-level Content-style Disentanglement (2020)9.41
- Latent Diffusion Bridges For Unsupervised Musical Audio Timbre Transfer (2024)3.58
- Speech-to-speech Translation With Discrete-unit-based Style Transfer (2023)0.00
- Enriching Source Style Transfer In Recognition-synthesis Based Non-parallel Voice Conversion (2021)9.23
- Style Equalization: Unsupervised Learning Of Controllable Generative Sequence Models (2021)0.00