Toward Speech Separation In The Pre-cocktail Party Problem With Tastas
2020 Β· Ziqiang Shi, Jiqing Han
Abstract
In this note, we propose to use TasTas \cite\{shi2020speech\} for the end-to-end approach to monaural speech separation in the pre-cocktail party problem. Our experiments on the public WSJ0-5mix data corpus results in 10.41dB SDR improvement. If online voice data remixing augmentation \cite\{zeghidour2020wavesplit\} is adopted in training, an 11.14dB SDR improvement can be achieved. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.
Authors
(none)
Tags
Stats
Code
Related papers
- Speech Separation Based On Multi-stage Elaborated Dual-path Deep Bilstm With Auxiliary Identity Loss (2020)9.77
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Dual-path Filter Network: Speaker-aware Modeling For Speech Separation (2021)3.58
- Demystifying Tasnet: A Dissecting Approach (2019)12.10
- Beam-guided Tasnet: An Iterative Speech Separation Framework With Multi-channel Output (2021)9.76
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- Effective Low-cost Time-domain Audio Separation Using Globally Attentive Locally Recurrent Networks (2021)10.07