Mad Twinnet: Masker-denoiser Architecture With Twin Networks For Monaural Sound Source Separation
2018 · Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, et al.
Abstract
Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel deep learning based method that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.
Authors
(none)
Tags
Stats
Related papers
- Htmd-net: A Hybrid Masking-denoising Approach To Time-domain Monaural Singing Voice Separation (2021)2.26
- Multichannel Singing Voice Separation By Deep Neural Network Informed DOA Constrained CNMF (2020)5.84
- A Recurrent Encoder-decoder Approach With Skip-filtering Connections For Monaural Singing Voice Separation (2017)9.41
- Monaural Singing Voice Separation With Skip-filtering Connections And Recurrent Inference Of Time-frequency Mask (2017)10.07
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Voice And Accompaniment Separation In Music Using Self-attention Convolutional Neural Network (2020)0.00
- Depthwise Separable Convolutions Versus Recurrent Neural Networks For Monaural Singing Voice Separation (2020)0.00
- Mmdenselstm: An Efficient Combination Of Convolutional And Recurrent Neural Networks For Audio Source Separation (2018)15.28