Exploring End-to-end Multi-channel ASR With Bias Information For Meeting Transcription
2020 Β· Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, et al.
Abstract
Joint optimization of multi-channel front-end and automatic speech recognition (ASR) has attracted much interest. While promising results have been reported for various tasks, past studies on its meeting transcription application were limited to small scale experiments. It is still unclear whether such a joint framework can be beneficial for a more practical setup where a massive amount of single channel training data can be leveraged for building a strong ASR back-end. In this work, we present our investigation on the joint modeling of a mask-based beamformer and Attention-Encoder-Decoder-based ASR in the setting where we have 75k hours of single-channel data and a relatively small amount of real multi-channel data for model training. We explore effective training procedures, including a comparison of simulated and real multi-channel training data. To guide the recognition towards a target speaker and deal with overlapped speech, we also explore various combinations of bias informatio
Authors
(none)
Tags
Stats
Related papers
- Joint Beamforming And Speaker-attributed ASR For Real Distant-microphone Meeting Transcription (2024)2.26
- A Comparative Study On Multichannel Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)5.24
- End-to-end Multichannel Speaker-attributed ASR: Speaker Guided Decoder And Input Feature Analysis (2023)0.00
- Exploiting Single-channel Speech For Multi-channel End-to-end Speech Recognition (2021)0.00
- Mfcca:multi-frame Cross-channel Attention For Multi-speaker ASR In Multi-party Meeting Scenario (2022)7.81
- Improving Speaker Assignment In Speaker-attributed ASR For Real Meeting Applications (2024)0.00
- META-CAT: Speaker-informed Speech Embeddings Via Meta Information Concatenation For Multi-talker ASR (2024)3.58
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00