Closing The Gap Between Time-domain Multi-channel Speech Enhancement On Real And Simulation Conditions
2021 Β· Wangyou Zhang, Jing Shi, Chenda Li, et al.
Abstract
The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy betw
Authors
(none)
Tags
Stats
Related papers
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- Exploiting Single-channel Speech For Multi-channel End-to-end Speech Recognition (2021)0.00
- Inter-channel Conv-tasnet For Multichannel Speech Enhancement (2021)0.00
- FB-MSTCN: A Full-band Single-channel Speech Enhancement Method Based On Multi-scale Temporal Convolutional Network (2022)6.77
- Forknet: Simultaneous Time And Time-frequency Domain Modeling For Speech Enhancement (2023)0.00
- Exploring The Potential Of Data-driven Spatial Audio Enhancement Using A Single-channel Model (2024)0.00
- Multi-channel Speaker Verification For Single And Multi-talker Speech (2020)0.00
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00