Mbtfnet: Multi-band Temporal-frequency Neural Network For Singing Voice Enhancement
2023 Β· Weiming Xu, Zhouxuan Chen, Zhili Tan, et al.
Abstract
A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.
Authors
(none)
Tags
Stats
Related papers
- Htmd-net: A Hybrid Masking-denoising Approach To Time-domain Monaural Singing Voice Separation (2021)2.26
- Multi-band Multi-resolution Fully Convolutional Neural Networks For Singing Voice Separation (2019)5.84
- Mad Twinnet: Masker-denoiser Architecture With Twin Networks For Monaural Sound Source Separation (2018)0.00
- FB-MSTCN: A Full-band Single-channel Speech Enhancement Method Based On Multi-scale Temporal Convolutional Network (2022)6.77
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Dmf-net: A Decoupling-style Multi-band Fusion Model For Full-band Speech Enhancement (2022)7.16
- Monaural Speech Enhancement Using A Multi-branch Temporal Convolutional Network (2019)3.58
- Towards Improving Harmonic Sensitivity And Prediction Stability For Singing Melody Extraction (2023)0.00