Conversational Speech Separation: An Evaluation Study For Streaming Applications
2022 Β· Giovanni Morrone, Samuele Cornell, Enrico Zovato, et al.
Abstract
Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion. Hereafter we perform an evaluation study on practical design considerations for a CSS system, addressing important aspects which have been neglected in recent works. In particular, we focus on the trade-off between separation performance, computational requirements and output latency showing how an offline separation algorithm can be used to perform CSS with a desired latency. We carry out an extensive analysis on the choice of CSS processing window size and hop size on sparsely overlapped data. We find out that the best trade-off between computational burden and performance is obtained for a window of 5 s.
Authors
(none)
Tags
Stats
Related papers
- Low-latency Speaker-independent Continuous Speech Separation (2019)9.23
- Leveraging Real Conversational Data For Multi-channel Continuous Speech Separation (2022)0.00
- End-to-end Integration Of Speech Separation And Voice Activity Detection For Low-latency Diarization Of Telephone Conversations (2023)4.52
- Skim: Skipping Memory LSTM For Low-latency Real-time Continuous Speech Separation (2022)10.07
- Causal Self-supervised Pretrained Frontend With Predictive Code For Speech Separation (2025)0.00
- Meeting Recognition With Continuous Speech Separation And Transcription-supported Diarization (2023)6.77
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Diffcss: Diverse And Expressive Conversational Speech Synthesis With Diffusion Models (2025)0.00