Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation
2018 · Yi Luo, Nima Mesgarani
Abstract
Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved by applying a set of weighting functions (masks) to the encoder output. The modified encoder repr
Authors
(none)
Tags
Stats
Related papers
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Inter-channel Conv-tasnet For Multichannel Speech Enhancement (2021)0.00
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- X-tasnet: Robust And Accurate Time-domain Speaker Extraction Network (2020)10.48
- Demystifying Tasnet: A Dissecting Approach (2019)12.10
- An Enhanced Conv-tasnet Model For Speech Separation Using A Speaker Distance-based Loss Function (2022)0.00
- On Time Domain Conformer Models For Monaural Speech Separation In Noisy Reverberant Acoustic Environments (2023)5.84
- Beam-guided Tasnet: An Iterative Speech Separation Framework With Multi-channel Output (2021)9.76