Low-latency Speaker-independent Continuous Speech Separation
2019 Β· Takuya Yoshioka, Zhuo Chen, Changliang Liu, et al.
Abstract
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting tra
Authors
(none)
Tags
Stats
Related papers
- Conversational Speech Separation: An Evaluation Study For Streaming Applications (2022)0.00
- Leveraging Real Conversational Data For Multi-channel Continuous Speech Separation (2022)0.00
- Skim: Skipping Memory LSTM For Low-latency Real-time Continuous Speech Separation (2022)10.07
- Continuous Speech Separation Using Speaker Inventory For Long Multi-talker Recording (2020)7.50
- Speaker Separation Using Speaker Inventories And Estimated Speech (2020)6.34
- End-to-end Integration Of Speech Separation And Voice Activity Detection For Low-latency Diarization Of Telephone Conversations (2023)4.52
- Low-latency Deep Clustering For Speech Separation (2019)8.09
- Noise-aware Speech Separation With Contrastive Learning (2023)6.77