Improving Singing Voice Separation Using Deep U-net And Wave-u-net With Data Augmentation
2019 · Alice Cohen-Hadria, Axel Roebel, Geoffroy Peeters
Abstract
State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net and the Wave-U-Net that are structurally similar, but work on different input representations. First, we report a few results on variations of the U-Net model. Second, we will discuss the potential of state of the art speech and music transformation algorithms for augmentation of existing data sets and demonstrate that the effect of these augmentations depends on the signal representations used by the model. The results demonstrate a considerable improvement due to the augmentation for both models. But pitch transposition is the most effective augmentation strategy for the U-Net model, while tr
Authors
(none)
Tags
Stats
Related papers
- Improving Singing Voice Separation With The Wave-u-net Using Minimum Hyperspherical Energy (2019)7.16
- Singing Voice Separation: A Study On Training Data (2019)10.07
- Investigating U-nets With Various Intermediate Blocks For Spectrogram-based Singing Voice Separation (2019)0.00
- Singaug: Data Augmentation For Singing Voice Synthesis With Cycle-consistent Training Strategy (2022)7.16
- Depthwise Separable Convolutions Versus Recurrent Neural Networks For Monaural Singing Voice Separation (2020)0.00
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- Wave-u-net: A Multi-scale Neural Network For End-to-end Audio Source Separation (2018)0.00
- Unsupervised Singing Voice Conversion (2019)11.19