Self-supervised Learning With Random-projection Quantizer For Speech Recognition
2022 Β· Chung-Cheng Chiu, James Qin, Yu Zhang, et al.
Abstract
We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither the matrix nor the codebook is updated during self-supervised learning. Since the random-projection quantizer is not trained and is separated from the speech recognition model, the design makes the approach flexible and is compatible with universal speech recognition architecture. On LibriSpeech our approach achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models, and provides lower word-error-rates and latency than wav2vec 2.0 and w2v-BERT with streaming models. On multilingual tasks the approach also provides significant improvement over w
Authors
(none)
Tags
Stats
Related papers
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations (2019)0.00
- Speech Enhancement Using Self-supervised Pre-trained Model And Vector Quantization (2022)6.34
- Unsupervised Speech Recognition (2021)0.00
- Towards Unsupervised Phone And Word Segmentation Using Self-supervised Vector-quantized Neural Networks (2020)0.00
- Chunk Based Speech Pre-training With High Resolution Finite Scalar Quantization (2025)0.00
- Towards Unsupervised Speech Recognition And Synthesis With Quantized Speech Representation Learning (2019)0.00
- On The Impact Of Quantization And Pruning Of Self-supervised Speech Models For Downstream Speech Recognition Tasks "in-the-wild'' (2023)0.00