Dicow: Diarization-conditioned Whisper For Target Speaker Automatic Speech Recognition
2024 Β· Alexander Polok, Dominik Klement, Martin Kocour, et al.
Abstract
Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize to unseen speakers. In this work, we propose Diarization-Conditioned Whisper (DiCoW), a novel approach to target-speaker ASR that leverages speaker diarization outputs as conditioning information. DiCoW extends the pre-trained Whisper model by integrating diarization labels directly, eliminating reliance on speaker embeddings and reducing the need for extensive speaker-specific training data. Our method introduces frame-level diarization-dependent transformations (FDDT) and query-key biasing (QKb) techniques to refine the model's focus on target speakers while effectively handling overlapping speech. By leveraging diarization outputs as conditioning signals, DiCoW simplifies the workflow for multi-speaker ASR, improves generalization to unseen speakers and enables more reliable transcription i
Authors
(none)
Tags
Stats
Related papers
- Adapting Diarization-conditioned Whisper For End-to-end Multi-talker Speech Recognition (2025)0.00
- Target Speaker ASR With Whisper (2024)7.16
- Speaker Conditioned Acoustic Modeling For Multi-speaker Conversational ASR (2021)4.52
- Simultaneous Speech Recognition And Speaker Diarization For Monaural Dialogue Recordings With Target-speaker Acoustic Models (2019)0.00
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- DCF-DS: Deep Cascade Fusion Of Diarization And Separation For Speech Recognition Under Realistic Single-channel Conditions (2024)3.58
- Towards Unsupervised Speaker Diarization System For Multilingual Telephone Calls Using Pre-trained Whisper Model And Mixture Of Sparse Autoencoders (2024)2.26
- Data Efficient Child-adult Speaker Diarization With Simulated Conversations (2024)0.00