Wav2vec: Unsupervised Pre-training For Speech Recognition
2019 Β· Steffen Schneider, Alexei Baevski, Ronan Collobert, et al.
Abstract
We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available. Our approach achieves 2.43% WER on the nov92 test set. This outperforms Deep Speech 2, the best reported character-based system in the literature while using two orders of magnitude less labeled training data.
Authors
(none)
Tags
Stats
Related papers
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Unsupervised Speech Recognition (2021)0.00
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations (2019)0.00
- Wav2vec-s: Semi-supervised Pre-training For Low-resource ASR (2021)7.50
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- Vec2wav 2.0: Advancing Voice Conversion Via Discrete Token Vocoders (2024)0.00
- Exploring Wav2vec 2.0 On Speaker Verification And Language Identification (2020)15.59