Wav-bert: Cooperative Acoustic And Linguistic Representation Learning For Low-resource Speech Recognition
2021 Β· Guolin Zheng, Yubei Xiao, Ke Gong, et al.
Abstract
Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text is unexplored, which hinders the utilization of acoustic and linguistic information. Moreover, previous works simply replace the embedding layer of the pre-trained language model with the acoustic features, which may cause the catastrophic forgetting problem. In this work, we introduce Wav-BERT, a cooperative acoustic and linguistic representation learning method to fuse and utilize the contextual information of speech and text. Specifically, we unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework. A Representation Aggregation Modul
Authors
(none)
Tags
Stats
Related papers
- Wabert: A Low-resource End-to-end Model For Spoken Language Understanding And Speech-to-bert Alignment (2022)0.00
- Improving Non-autoregressive End-to-end Speech Recognition With Pre-trained Acoustic And Language Models (2022)10.07
- On Scaling Contrastive Representations For Low-resource Speech Recognition (2021)3.58
- Uniwav: Towards Unified Pre-training For Speech Representation Learning And Generation (2025)0.00
- W2v-bert: Combining Contrastive Learning And Masked Language Modeling For Self-supervised Speech Pre-training (2021)17.78
- Bidirectional Representations For Low Resource Spoken Language Understanding (2022)0.00
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00