Exploiting Single-channel Speech For Multi-channel End-to-end Speech Recognition
2021 Β· Keyu An, Zhijian Ou
Abstract
Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition. However, the integration of multiple modules makes it more difficult to perform end-to-end training, particularly given that the multi-channel speech corpus recorded in real environments with a sizeable data scale is relatively limited. This paper explores the usage of single-channel data to improve the multi-channel end-to-end speech recognition system. Specifically, we design three schemes to exploit the single-channel data, namely pre-training, data scheduling, and data simulation. Extensive experiments on CHiME4 and AISHELL-4 datasets demonstrate that all three methods improve the multi-channel end-to-end training stability and speech recognition performance, while the data scheduling approach keeps a much simpler pipeline (vs. pre-training) and less computation cost (vs. data simulation). Moreover, we give a thorough analysis
Authors
(none)
Tags
Stats
Related papers
- Closing The Gap Between Time-domain Multi-channel Speech Enhancement On Real And Simulation Conditions (2021)8.82
- Exploring The Potential Of Data-driven Spatial Audio Enhancement Using A Single-channel Model (2024)0.00
- Exploring End-to-end Multi-channel ASR With Bias Information For Meeting Transcription (2020)7.16
- Multi-channel Target Speech Extraction With Channel Decorrelation And Target Speaker Adaptation (2020)0.00
- MIMO-SPEECH: End-to-end Multi-channel Multi-speaker Speech Recognition (2019)13.93
- Automatic Channel Selection And Spatial Feature Integration For Multi-channel Speech Recognition Across Various Array Topologies (2023)8.09
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- An Investigation Of End-to-end Multichannel Speech Recognition For Reverberant And Mismatch Conditions (2019)0.00