CLSRIL-23: Cross Lingual Speech Representations For Indic Languages
2021 Β· Anirudh Gupta, Harveen Singh Chadha, Priyanshi Shah, et al.
Abstract
We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5% is observed in WER and 9.5% in CER when a multilingual pretrained model is used for finetuning in Hindi. All the code models are also open sourced. CLSRIL-23 is a model t
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Cross-lingual Representation Learning For Speech Recognition (2020)18.91
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Indicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTS (2024)2.26
- Improved Self-supervised Multilingual Speech Representation Learning Combined With Auxiliary Language Information (2022)0.00
- CLASP: Contrastive Language-speech Pretraining For Multilingual Multimodal Information Retrieval (2024)0.00
- XLST: Cross-lingual Self-training To Learn Multilingual Representation For Low Resource Speech Recognition (2021)8.82
- Multilingual Speech Recognition With A Single End-to-end Model (2017)16.05
- An Adapter Based Pre-training For Efficient And Scalable Self-supervised Speech Representation Learning (2021)8.35