Unsupervised Cross-lingual Representation Learning For Speech Recognition
2020 Β· Alexis Conneau, Alexei Baevski, Ronan Collobert, et al.
Abstract
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages. We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations and jointly learns a quantization of the latents shared across languages. The resulting model is fine-tuned on labeled data and experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining. On the CommonVoice benchmark, XLSR shows a relative phoneme error rate reduction of 72% compared to the best known results. On BABEL, our approach improves word error rate by 16% relative compared to a comparable system. Our approach enables a single multilingual speech recognition model which is competitive to strong individual models. Analysis shows that the latent discrete speech representations are shared across languages with increased sharing for related languages. We hope to catalyze rese
Authors
(none)
Tags
Stats
Related papers
- Language Adaptive Cross-lingual Speech Representation Learning With Sparse Sharing Sub-networks (2022)8.35
- XLST: Cross-lingual Self-training To Learn Multilingual Representation For Low Resource Speech Recognition (2021)8.82
- Improved Self-supervised Multilingual Speech Representation Learning Combined With Auxiliary Language Information (2022)0.00
- CLSRIL-23: Cross Lingual Speech Representations For Indic Languages (2021)0.00
- XLAVS-R: Cross-lingual Audio-visual Speech Representation Learning For Noise-robust Speech Perception (2024)7.50
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- An Adapter Based Pre-training For Efficient And Scalable Self-supervised Speech Representation Learning (2021)8.35
- Multilingual Self-supervised Speech Representations Improve The Speech Recognition Of Low-resource African Languages With Codeswitching (2023)0.00