Auxiliary Interference Speaker Loss For Target-speaker Speech Recognition
2019 Β· Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, et al.
Abstract
In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers speech given a short sample of the target speaker. The proposed auxiliary loss function attempts to additionally maximize interference speaker ASR accuracy during training. This will regularize the network to achieve a better representation for speaker separation, thus achieving better accuracy on the target-speaker ASR. We evaluated our proposed method using two-speaker-mixed speech in various signal-to-interference-ratio conditions. We first built a strong target-speaker ASR baseline based on the state-of-the-art lattice-free maximum mutual information. This baseline achieved a word error rate (WER) of 18.06% on the test set while a normal ASR trained with clean data produced a completely corrupted result (WER of 84.71%). Then, our proposed loss furthe
Authors
(none)
Tags
Stats
Related papers
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- A Hybrid Continuity Loss To Reduce Over-suppression For Time-domain Target Speaker Extraction (2022)0.00
- Elevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech Recognition (2025)0.00
- Enhancing And Adversarial: Improve ASR With Speaker Labels (2022)5.24
- Transcription-free Fine-tuning Of Speech Separation Models For Noisy And Reverberant Multi-speaker Automatic Speech Recognition (2024)3.58
- Speaker Conditioning Of Acoustic Models Using Affine Transformation For Multi-speaker Speech Recognition (2021)0.00
- Minimum Bayes Risk Training For End-to-end Speaker-attributed ASR (2020)0.00