Singing Voice Separation Using A Deep Convolutional Neural Network Trained By Ideal Binary Mask And Cross Entropy
2018 · Kin Wah Edward Lin, Balamurali B. T., Enyan Koh, et al.
Abstract
Separating a singing voice from its music accompaniment remains an important challenge in the field of music information retrieval. We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms. The pixel-wise classification technique directly estimates the sound source label for each time-frequency (T-F) bin in our spectrogram image, thus eliminating common pre- and postprocessing tasks. The proposed network is trained by using the Ideal Binary Mask (IBM) as the target output label. The IBM identifies the dominant sound source in each T-F bin of the magnitude spectrogram of a mixture signal, by considering each T-F bin as a pixel with a multi-label (for each sound source). Cross entropy is used as the training objective, so as to minimize the average probability error between the target a
Authors
(none)
Tags
Stats
Related papers
- Voice And Accompaniment Separation In Music Using Self-attention Convolutional Neural Network (2020)0.00
- A Recurrent Encoder-decoder Approach With Skip-filtering Connections For Monaural Singing Voice Separation (2017)9.41
- Multi-band Multi-resolution Fully Convolutional Neural Networks For Singing Voice Separation (2019)5.84
- Depthwise Separable Convolutions Versus Recurrent Neural Networks For Monaural Singing Voice Separation (2020)0.00
- Multichannel Singing Voice Separation By Deep Neural Network Informed DOA Constrained CNMF (2020)5.84
- Htmd-net: A Hybrid Masking-denoising Approach To Time-domain Monaural Singing Voice Separation (2021)2.26
- Unsupervised Singing Voice Conversion (2019)11.19
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81