Singing Synthesis: With A Little Help From My Attention
2019 Β· Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, et al.
Abstract
We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions. These two classes of models have significantly affected the field of text-to-speech, but have never been thoroughly applied to the task of singing synthesis. UTACO demonstrates that attention can be successfully applied to the singing synthesis field and improves naturalness over the state of the art. The system requires considerably less explicit modelling of voice features such as F0 patterns, vibratos, and note and phoneme durations, than previous models in the literature. Despite this, it shows a strong improvement in naturalness with respect to previous neural singing synthesis models. The model does not require any durations or pitch patterns as inputs, and learns to insert vibrato autonomously according to the musical context. However, we observe that, by completely dispensing with any explicit duration modelling it becomes ha
Authors
(none)
Tags
Stats
Related papers
- Singing Voice Synthesis Based On A Musical Note Position-aware Attention Mechanism (2022)0.00
- Sequence-to-sequence Singing Synthesis Using The Feed-forward Transformer (2019)10.85
- Unisyn: An End-to-end Unified Model For Text-to-speech And Singing Voice Synthesis (2022)0.00
- A Melody-unsupervision Model For Singing Voice Synthesis (2021)5.84
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Bytesing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-decoder Acoustic Models And Wavernn Vocoders (2020)11.49
- Automatic Lyrics Transcription Using Dilated Convolutional Neural Networks With Self-attention (2020)10.07
- Singing Voice Synthesis With Vibrato Modeling And Latent Energy Representation (2022)5.24