Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation
2017 Β· Yi Luo, Nima Mesgarani
Abstract
Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on enco
Authors
(none)
Tags
Stats
Related papers
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- Demystifying Tasnet: A Dissecting Approach (2019)12.10
- X-tasnet: Robust And Accurate Time-domain Speaker Extraction Network (2020)10.48
- Beam-guided Tasnet: An Iterative Speech Separation Framework With Multi-channel Output (2021)9.76
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- Time Domain Audio Visual Speech Separation (2019)14.62
- Effective Low-cost Time-domain Audio Separation Using Globally Attentive Locally Recurrent Networks (2021)10.07
- Speech Separation Based On Multi-stage Elaborated Dual-path Deep Bilstm With Auxiliary Identity Loss (2020)9.77