Minimum Bayes Risk Training For End-to-end Speaker-attributed ASR
2020 Β· Naoyuki Kanda, Zhong Meng, Liang Lu, et al.
Abstract
Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech. In the previous study, the model parameters were trained based on the speaker-attributed maximum mutual information (SA-MMI) criterion, with which the joint posterior probability for multi-talker transcription and speaker identification are maximized over training data. Although SA-MMI training showed promising results for overlapped speech consisting of various numbers of speakers, the training criterion was not directly linked to the final evaluation metric, i.e., speaker-attributed word error rate (SA-WER). In this paper, we propose a speaker-attributed minimum Bayes risk (SA-MBR) training method where the parameters are trained to directly minimize the expected SA-WER over the training data. Experiments using the LibriSpeech corpus show that the proposed SA-MBR trai
Authors
(none)
Tags
Stats
Related papers
- Investigation Of End-to-end Speaker-attributed ASR For Continuous Multi-talker Recordings (2020)10.35
- Minimum Bayes Risk Training Of Rnn-transducer For End-to-end Speech Recognition (2019)0.00
- Minimum Word Error Rate Training For Attention-based Sequence-to-sequence Models (2017)14.35
- A Comparative Study Of Modular And Joint Approaches For Speaker-attributed ASR On Monaural Long-form Audio (2021)7.50
- Auxiliary Interference Speaker Loss For Target-speaker Speech Recognition (2019)9.76
- MSA-ASR: Efficient Multilingual Speaker Attribution With Frozen ASR Models (2024)2.26
- Hypothesis Stitcher For End-to-end Speaker-attributed ASR On Long-form Multi-talker Recordings (2021)5.24
- Improving Speaker Assignment In Speaker-attributed ASR For Real Meeting Applications (2024)0.00