Meeting Recognition With Continuous Speech Separation And Transcription-supported Diarization
2023 Β· Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, et al.
Abstract
We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet separation architecture, followed by a speaker-agnostic speech recognizer, we achieve state-of-the-art recognition performance in terms of Optimal Reference Combination Word Error Rate (ORC WER). Then, a d-vector-based diarization module is employed to extract speaker embeddings from the enhanced signals and to assign the CSS outputs to the correct speaker. Here, we propose a syntactically informed diarization using sentence- and word-level boundaries of the ASR module to support speaker turn detection. This results in a state-of-the-art Concatenated minimum-Permutation Word Error Rate (cpWER) for the full meeting recognition pipeline.
Authors
(none)
Tags
Stats
Related papers
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- TS-SEP: Joint Diarization And Separation Conditioned On Estimated Speaker Embeddings (2023)10.35
- End-to-end Diarization For Variable Number Of Speakers With Local-global Networks And Discriminative Speaker Embeddings (2021)0.00
- Simultaneous Diarization And Separation Of Meetings Through The Integration Of Statistical Mixture Models (2024)0.00
- Royalflush Speaker Diarization System For ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (2022)0.00
- Combining Tf-gridnet And Mixture Encoder For Continuous Speech Separation For Meeting Transcription (2023)0.00
- Unified Modeling Of Multi-talker Overlapped Speech Recognition And Diarization With A Sidecar Separator (2023)7.50
- Incorporating Spatial Cues In Modular Speaker Diarization For Multi-channel Multi-party Meetings (2024)4.52