A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition
2022 Β· Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, et al.
Abstract
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only improve the ASR performance on the noisy test set which surpasses the original wav2vec2.0, but also ensure
Authors
(none)
Tags
Stats
Related papers
- Rep2wav: Noise Robust Text-to-speech Using Self-supervised Representations (2023)0.00
- Robust Data2vec: Noise-robust Speech Representation Learning For ASR By Combining Regression And Improved Contrastive Learning (2022)9.76
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Wav2vec-s: Semi-supervised Pre-training For Low-resource ASR (2021)7.50
- Wav2vec-switch: Contrastive Learning From Original-noisy Speech Pairs For Robust Speech Recognition (2021)12.93
- Robust Wav2vec 2.0: Analyzing Domain Shift In Self-supervised Pre-training (2021)25.07
- Multichannel Av-wav2vec2: A Framework For Learning Multichannel Multi-modal Speech Representation (2024)7.16
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00