Fat-hubert: Front-end Adaptive Training Of Hidden-unit BERT For Distortion-invariant Robust Speech Recognition
2023 Β· Dongning Yang, Wei Wang, Yanmin Qian
Abstract
Advancements in monaural speech enhancement (SE) techniques have greatly improved the perceptual quality of speech. However, integrating these techniques into automatic speech recognition (ASR) systems has not yielded the expected performance gains, primarily due to the introduction of distortions during the SE process. In this paper, we propose a novel approach called FAT-HuBERT, which leverages distortion-invariant self-supervised learning (SSL) to enhance the robustness of ASR. To address the distortions introduced by the SE frontends, we introduce layer-wise fusion modules that incorporate features extracted from both observed noisy signals and enhanced signals. During training, the SE frontend is randomly selected from a pool of models. We evaluate the performance of FAT-HuBERT on simulated noisy speech generated from LibriSpeech as well as real-world noisy speech from the CHiME-4 1-channel dataset. The experimental results demonstrate a significant relative reduction in word erro
Authors
(none)
Tags
Stats
Related papers
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- An Adapter Based Multi-label Pre-training For Speech Separation And Enhancement (2022)7.50
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Ms-hubert: Mitigating Pre-training And Inference Mismatch In Masked Language Modelling Methods For Learning Speech Representations (2024)4.52
- Speech And Noise Dual-stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition (2023)5.24
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Bridging The Gap Between Monaural Speech Enhancement And Recognition With Distortion-independent Acoustic Modeling (2019)7.50
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34