Leveraging Symmetrical Convolutional Transformer Networks For Speech To Singing Voice Style Transfer
2022 Β· Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi
Abstract
In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alignment of the input speech with the target melody while preserving the speaker identity and naturalness. The proposed SymNet model is comprised of symmetrical stack of three types of layers - convolutional, transformer, and self-attention layers. The paper also explores novel data augmentation and generative loss annealing methods to facilitate the model training. Experiments are performed on the NUS and NHSS datasets which consist of parallel data of speech and singing voice. In these experiments, we show that the proposed SymNet model improves the objective reconstruction quality significan
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Singing Voice Conversion (2019)11.19
- Improving Data Augmentation-based Cross-speaker Style Transfer For TTS With Singing Voice, Style Filtering, And F0 Matching (2024)0.00
- Singing Voice Conversion With Non-parallel Data (2019)9.59
- Self-supervised Singing Voice Pre-training Towards Speech-to-singing Conversion (2024)0.00
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- Enriching Source Style Transfer In Recognition-synthesis Based Non-parallel Voice Conversion (2021)9.23
- Sequence-to-sequence Singing Synthesis Using The Feed-forward Transformer (2019)10.85