Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations
2019 Β· Alexei Baevski, Steffen Schneider, Michael Auli
Abstract
We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.
Authors
(none)
Tags
Stats
Related papers
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Vec2wav 2.0: Advancing Voice Conversion Via Discrete Token Vocoders (2024)0.00
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Exploring Wav2vec 2.0 On Speaker Verification And Language Identification (2020)15.59
- Unsupervised Speech Recognition (2021)0.00
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00