Effective Low-cost Time-domain Audio Separation Using Globally Attentive Locally Recurrent Networks
2021 Β· Max W. Y. Lam, Jun Wang, Dan Su, et al.
Abstract
Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation. Nevertheless, conventional TasNets struggle to satisfy the memory and latency constraints in industrial applications. In this regard, we design a low-cost high-performance architecture, namely, globally attentive locally recurrent (GALR) network. Alike the dual-path RNN (DPRNN), we first split a feature sequence into 2D segments and then process the sequence along both the intra- and inter-segment dimensions. Our main innovation lies in that, on top of features recurrently processed along the inter-segment dimensions, GALR applies a self-attention mechanism to the sequence along the inter-segment dimension, which aggregates context-aware information and also enables parallelization. Our experiments suggest that GALR is a notably more effective network than the prior work. On one hand, with only 1.5M parameters, it has achieved comparable separation performance at a muc
Authors
(none)
Tags
Stats
Related papers
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Rtfs-net: Recurrent Time-frequency Modelling For Efficient Audio-visual Speech Separation (2023)0.00
- Sandglasset: A Light Multi-granularity Self-attentive Network For Time-domain Speech Separation (2021)11.93
- Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation (2022)0.00
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- Demystifying Tasnet: A Dissecting Approach (2019)12.10
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)21.06
- Lafurca: Iterative Refined Speech Separation Based On Context-aware Dual-path Parallel Bi-lstm (2020)0.00