Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR
2022 Β· Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, et al.
Abstract
Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion. On the other hand, it was shown that self-supervised pre-training enables the utilization of a large amount of unlabeled noisy data, which is rather beneficial for the noise robustness of ASR. However, the potential of the (optimal) integration of SE and self-supervised pre-training still remains unclear. In order to find an appropriate combination and reduce the impact of speech distortion caused by SE, in this paper we therefore propose a joint pre-training approach for the SE module and the self-supervised model. First, in the pre-training phase the original noisy waveform or the waveform obtained by SE is fed into the self-supervised model to learn the contextual representation, where the quantified clean speech acts as the target. Second, we propose a d
Authors
(none)
Tags
Stats
Related papers
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- How Does End-to-end Speech Recognition Training Impact Speech Enhancement Artifacts? (2023)7.50
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- Snri Target Training For Joint Speech Enhancement And Recognition (2021)8.82
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Self-supervised Learning Based Monaural Speech Enhancement With Multi-task Pre-training (2021)0.00
- Robust Data2vec: Noise-robust Speech Representation Learning For ASR By Combining Regression And Improved Contrastive Learning (2022)9.76