Hypothesis Stitcher For End-to-end Speaker-attributed ASR On Long-form Multi-talker Recordings
2021 Β· Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, et al.
Abstract
An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) model was proposed recently to jointly perform speaker counting, speech recognition and speaker identification. The model achieved a low speaker-attributed word error rate (SA-WER) for monaural overlapped speech comprising an unknown number of speakers. However, the E2E modeling approach is susceptible to the mismatch between the training and testing conditions. It has yet to be investigated whether the E2E SA-ASR model works well for recordings that are much longer than samples seen during training. In this work, we first apply a known decoding technique that was developed to perform single-speaker ASR for long-form audio to our E2E SA-ASR task. Then, we propose a novel method using a sequence-to-sequence model, called hypothesis stitcher. The model takes multiple hypotheses obtained from short audio segments that are extracted from the original long-form input, and it then outputs a fused single hypothesis.
Authors
(none)
Tags
Stats
Related papers
- Investigation Of End-to-end Speaker-attributed ASR For Continuous Multi-talker Recordings (2020)10.35
- A Comparative Study Of Modular And Joint Approaches For Speaker-attributed ASR On Monaural Long-form Audio (2021)7.50
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- End-to-end Monaural Multi-speaker ASR System Without Pretraining (2018)11.93
- MSA-ASR: Efficient Multilingual Speaker Attribution With Frozen ASR Models (2024)2.26
- Joint Speaker Counting, Speech Recognition, And Speaker Identification For Overlapped Speech Of Any Number Of Speakers (2020)12.54
- Improving Speaker Assignment In Speaker-attributed ASR For Real Meeting Applications (2024)0.00
- Improved Long-form Speech Recognition By Jointly Modeling The Primary And Non-primary Speakers (2023)0.00