Low-latency Deep Clustering For Speech Separation
2019 Β· Shanshan Wang, Gaurav Naithani, Tuomas Virtanen
Abstract
This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal. The buffer duration would serve as an initialization phase after which the system is capable of operating with 8 ms algorithmic latency. We evaluate our proposed approach on two-speaker mixtures from the Wall Street Journal (WSJ0) corpus. We observe that the use of LSTM yields around one dB lower SDR as compared to the baseline bidirectional LSTM in terms of source to distortion ratio (SDR). Moreover, using an 8 ms synthesis window ins
Authors
(none)
Tags
Stats
Related papers
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Skim: Skipping Memory LSTM For Low-latency Real-time Continuous Speech Separation (2022)10.07
- Analysis Of Deep Clustering As Preprocessing For Automatic Speech Recognition Of Sparsely Overlapping Speech (2019)9.59
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00
- Overlap-aware Low-latency Online Speaker Diarization Based On End-to-end Local Segmentation (2021)10.35
- Real-time Speech Enhancement And Separation With A Unified Deep Neural Network For Single/dual Talker Scenarios (2023)2.26
- Orthonormal Embedding-based Deep Clustering For Single-channel Speech Separation (2019)0.00
- Directed Speech Separation For Automatic Speech Recognition Of Long Form Conversational Speech (2021)2.26