Htmd-net: A Hybrid Masking-denoising Approach To Time-domain Monaural Singing Voice Separation
2021 · Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
Abstract
The advent of deep learning has led to the prevalence of deep neural network architectures for monaural music source separation, with end-to-end approaches that operate directly on the waveform level increasingly receiving research attention. Among these approaches, transformation of the input mixture to a learned latent space, and multiplicative application of a soft mask to the latent mixture, achieves the best performance, but is prone to the introduction of artifacts to the source estimate. To alleviate this problem, in this paper we propose a hybrid time-domain approach, termed the HTMD-Net, combining a lightweight masking component and a denoising module, based on skip connections, in order to refine the source estimated by the masking procedure. Evaluation of our approach in the task of monaural singing voice separation in the musdb18 dataset indicates that our proposed method achieves competitive performance compared to methods based purely on masking when trained under the sam
Authors
(none)
Tags
Stats
Related papers
- Mad Twinnet: Masker-denoiser Architecture With Twin Networks For Monaural Sound Source Separation (2018)0.00
- Monaural Singing Voice Separation With Skip-filtering Connections And Recurrent Inference Of Time-frequency Mask (2017)10.07
- A Recurrent Encoder-decoder Approach With Skip-filtering Connections For Monaural Singing Voice Separation (2017)9.41
- Multichannel Singing Voice Separation By Deep Neural Network Informed DOA Constrained CNMF (2020)5.84
- Improving Singing Voice Separation With The Wave-u-net Using Minimum Hyperspherical Energy (2019)7.16
- Mmdenselstm: An Efficient Combination Of Convolutional And Recurrent Neural Networks For Audio Source Separation (2018)15.28
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Mbtfnet: Multi-band Temporal-frequency Neural Network For Singing Voice Enhancement (2023)3.58