Analysing Deep Learning-spectral Envelope Prediction Methods For Singing Synthesis
2019 Β· Frederik Bous, Axel Roebel
Abstract
We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs.\ single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.
Authors
(none)
Tags
Stats
Related papers
- An Empirical Study On End-to-end Singing Voice Synthesis With Encoder-decoder Architectures (2021)0.00
- Towards Improving Harmonic Sensitivity And Prediction Stability For Singing Melody Extraction (2023)0.00
- Adversarial Multi-task Learning For Disentangling Timbre And Pitch In Singing Voice Synthesis (2022)4.52
- A Neural Parametric Singing Synthesizer (2017)10.97
- Fast And High-quality Singing Voice Synthesis System Based On Convolutional Neural Networks (2019)8.82
- A Recurrent Encoder-decoder Approach With Skip-filtering Connections For Monaural Singing Voice Separation (2017)9.41
- Singing Voice Synthesis Based On Convolutional Neural Networks (2019)0.00
- Semi-supervised Learning For Singing Synthesis Timbre (2020)3.58