A Comprehensive Study Of Speech Separation: Spectrogram Vs Waveform Separation
2019 Β· Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, et al.
Abstract
Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) is introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of the TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequen
Authors
(none)
Tags
Stats
Related papers
- Demystifying Tasnet: A Dissecting Approach (2019)12.10
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- End-to-end Multi-channel Speech Separation (2019)0.00
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Beam-guided Tasnet: An Iterative Speech Separation Framework With Multi-channel Output (2021)9.76
- Music Source Separation In The Waveform Domain (2019)0.00
- Filterbank Design For End-to-end Speech Separation (2019)12.17