A Toolkit For Joint Speaker Diarization And Identification With Application To Speaker-attributed ASR
2024 Β· Giovanni Morrone, Enrico Zovato, Fabio Brugnara, et al.
Abstract
We present a modular toolkit to perform joint speaker diarization and speaker identification. The toolkit can leverage on multiple models and algorithms which are defined in a configuration file. Such flexibility allows our system to work properly in various conditions (e.g., multiple registered speakers' sets, acoustic conditions and languages) and across application domains (e.g. media monitoring, institutional, speech analytics). In this demonstration we show a practical use-case in which speaker-related information is used jointly with automatic speech recognition engines to generate speaker-attributed transcriptions. To achieve that, we employ a user-friendly web-based interface to process audio and video inputs with the chosen configuration.
Authors
(none)
Tags
Stats
Related papers
- 3d-speaker-toolkit: An Open-source Toolkit For Multimodal Speaker Verification And Diarization (2024)6.93
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- Robust Acoustic Domain Identification With Its Application To Speaker Diarization (2022)2.26
- A Comparative Study Of Modular And Joint Approaches For Speaker-attributed ASR On Monaural Long-form Audio (2021)7.50
- Simultaneous Speech Recognition And Speaker Diarization For Monaural Dialogue Recordings With Target-speaker Acoustic Models (2019)0.00
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00