Deep Clustering And Conventional Networks For Music Separation: Stronger Together
2016 Β· Yi Luo, Zhuo Chen, John R. Hershey, et al.
Abstract
Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore comb
Authors
(none)
Tags
Stats
Related papers
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Orthonormal Embedding-based Deep Clustering For Single-channel Speech Separation (2019)0.00
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Mad Twinnet: Masker-denoiser Architecture With Twin Networks For Monaural Sound Source Separation (2018)0.00
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60
- Voice And Accompaniment Separation In Music Using Self-attention Convolutional Neural Network (2020)0.00
- A Recurrent Encoder-decoder Approach With Skip-filtering Connections For Monaural Singing Voice Separation (2017)9.41