Improving Speaker Assignment In Speaker-attributed ASR For Real Meeting Applications
2024 Β· Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, et al.
Abstract
Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life applications involving Voice Activity Detection (VAD), Speaker Diarization (SD), and SA-ASR. Second, we advocate using VAD output segments to fine-tune the SA-ASR model, considering that it is also applied to VAD segments during test, and show that this results in a relative reduction of Speaker Error Rate (SER) up to 28%. Finally, we explore strategies to enhance the extraction of the speaker embedding templates used as inputs by the SA-ASR system. We show that extracting them from SD output rather than annotated speaker segments results in a relative SER reduction up to 20%.
Authors
(none)
Tags
Stats
Related papers
- Joint Beamforming And Speaker-attributed ASR For Real Distant-microphone Meeting Transcription (2024)2.26
- A Comparative Study Of Modular And Joint Approaches For Speaker-attributed ASR On Monaural Long-form Audio (2021)7.50
- A Comparative Study On Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)8.09
- A Comparative Study On Multichannel Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)5.24
- Once More Diarization: Improving Meeting Transcription Systems Through Segment-level Speaker Reassignment (2024)5.24
- End-to-end Multichannel Speaker-attributed ASR: Speaker Guided Decoder And Input Feature Analysis (2023)0.00
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Investigation Of End-to-end Speaker-attributed ASR For Continuous Multi-talker Recordings (2020)10.35