Sandglasset: A Light Multi-granularity Self-attentive Network For Time-domain Speech Separation
2021 Β· Max W. Y. Lam, Jun Wang, Dan Su, et al.
Abstract
One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost. Forward along each block inside Sandglasset, the temporal granularity of the features gradually becomes coarser until reaching half of the network blocks, and then successively turns finer towards the raw signal level. We also unfold that residual connections between features with the same granularity are critical for preserving information after passing through the bottleneck layer. Experiments show our Sandglasset with only 2.3M parameters has achi
Authors
(none)
Tags
Stats
Related papers
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Effective Low-cost Time-domain Audio Separation Using Globally Attentive Locally Recurrent Networks (2021)10.07
- Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation (2022)0.00
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Tf-gridnet: Integrating Full- And Sub-band Modeling For Speech Separation (2022)0.00
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- Beam-guided Tasnet: An Iterative Speech Separation Framework With Multi-channel Output (2021)9.76
- Time Domain Audio Visual Speech Separation (2019)14.62