Play As You Like: Timbre-enhanced Multi-modal Music Style Transfer
2018 Β· Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, et al.
Abstract
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency
Authors
(none)
Tags
Stats
Related papers
- Stylus: Repurposing Stable Diffusion For Training-free Music Style Transfer On Mel-spectrograms (2024)0.00
- Time Domain Neural Audio Style Transfer (2017)0.00
- Timbre Transfer With Variational Auto Encoding And Cycle-consistent Adversarial Networks (2021)0.00
- MM-TTS: Multi-modal Prompt Based Style Transfer For Expressive Text-to-speech Synthesis (2023)8.60
- Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios (2021)6.77
- Unified Cross-modal Translation Of Score Images, Symbolic Music, And Performance Audio (2025)0.00
- Latent Diffusion Bridges For Unsupervised Musical Audio Timbre Transfer (2024)3.58
- Multi-speaker Multi-style Speech Synthesis With Timbre And Style Disentanglement (2022)6.77