Wave-u-net: A Multi-scale Neural Network For End-to-end Audio Source Separation
2018 Β· Daniel Stoller, Sebastian Ewert, Simon Dixon
Abstract
Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling technique and a context-aware prediction framework to reduce output artifacts. Experiments for singing voic
Authors
(none)
Tags
Stats
Related papers
- Time-domain Audio Source Separation Based On Wave-u-net Combined With Discrete Wavelet Transform (2020)9.76
- Audio Source Separation Via Multi-scale Learning With Dilated Dense U-nets (2019)0.00
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- Spectrogram-channels U-net: A Source Separation Model Viewing Each Channel As The Spectrogram Of Each Source (2018)0.00
- End-to-end Music Source Separation: Is It Possible In The Waveform Domain? (2018)11.58
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Improving Singing Voice Separation With The Wave-u-net Using Minimum Hyperspherical Energy (2019)7.16
- End-to-end Source Separation With Adaptive Front-ends (2017)12.17