Attention Is All You Need In Speech Separation
2020 Β· Cem Subakan, Mirco Ravanelli, Samuele Cornell, et al.
Abstract
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-dema
Authors
(none)
Tags
Stats
Related papers
- Resource-efficient Separation Transformer (2022)7.81
- Exploring Self-attention Mechanisms For Speech Separation (2022)12.54
- Tiny-sepformer: A Tiny Time-domain Transformer Network For Speech Separation (2022)8.82
- Transmask: A Compact And Fast Speech Separation Model Based On Transformer (2021)8.82
- Speech Separation Using An Asynchronous Fully Recurrent Convolutional Neural Network (2021)0.00
- A Comparative Study On Transformer Vs RNN In Speech Applications (2019)20.07
- Mossformer: Pushing The Performance Limit Of Monaural Speech Separation Using Gated Single-head Transformer With Convolution-augmented Joint Self-attentions (2023)13.55
- An Efficient Speech Separation Network Based On Recurrent Fusion Dilated Convolution And Channel Attention (2023)0.00