On Time Domain Conformer Models For Monaural Speech Separation In Noisy Reverberant Acoustic Environments
2023 Β· William Ravenscroft, Stefan Goetze, Thomas Hain
Abstract
Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.
Authors
(none)
Tags
Stats
Related papers
- Deformable Temporal Convolutional Networks For Monaural Noisy Reverberant Speech Separation (2022)8.09
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- Speaker-conditioning Single-channel Target Speaker Extraction Using Conformer-based Architectures (2022)6.34
- Two-stage Model And Optimal SI-SNR For Monaural Multi-speaker Speech Separation In Noisy Environment (2020)0.00
- Consep: A Noise- And Reverberation-robust Speech Separation Framework By Magnitude Conditioning (2024)0.00
- Multi-dimensional And Multi-scale Modeling For Speech Separation Optimized By Discriminative Learning (2023)0.00
- Tf-locoformer: Transformer With Local Modeling By Convolution For Speech Separation And Enhancement (2024)10.35
- Dasformer: Deep Alternating Spectrogram Transformer For Multi/single-channel Speech Separation (2023)0.00