Low-latency Speech Separation Guided Diarization For Telephone Conversations
2022 Β· Giovanni Morrone, Samuele Cornell, Desh Raj, et al.
Abstract
In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be readily fed to a speech recognition back-end with performance close to the oracle source signals.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Integration Of Speech Separation And Voice Activity Detection For Low-latency Diarization Of Telephone Conversations (2023)4.52
- Neural Blind Source Separation And Diarization For Distant Speech Recognition (2024)0.00
- An Experimental Review Of Speaker Diarization Methods With Application To Two-speaker Conversational Telephone Speech Recordings (2023)8.35
- DCF-DS: Deep Cascade Fusion Of Diarization And Separation For Speech Recognition Under Realistic Single-channel Conditions (2024)3.58
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- SLOGD: Speaker Location Guided Deflation Approach To Speech Separation (2019)0.00
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23