Accidental Learners: Spoken Language Identification In Multilingual Self-supervised Models
2022 Β· Travis M. Bartley, Fei Jia, Krishna C. Puvvada, et al.
Abstract
In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm. We find that pre-trained speech models optimally encode language discriminatory information in lower layers. Further, we demonstrate that the embeddings obtained from these layers are significantly robust to classify unseen languages and different acoustic environments without additional training. After fine-tuning a pre-trained Conformer model on the VoxLingua107 dataset, we achieve results similar to current state-of-the-art systems for language identification. More, our model accomplishes this with 5x less parameters. We open-source the model through the NVIDIA NeMo toolkit.
Authors
(none)
Tags
Stats
Related papers
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Universal Paralinguistic Speech Representations Using Self-supervised Conformers (2021)10.48
- Attentive Temporal Pooling For Conformer-based Streaming Language Identification In Long-form Speech (2022)7.16
- Pretraining Approaches For Spoken Language Recognition: Taltech Submission To The OLR 2021 Challenge (2022)6.34
- Conformer-based Self-supervised Learning For Non-speech Audio Tasks (2021)7.50
- Improved Self-supervised Multilingual Speech Representation Learning Combined With Auxiliary Language Information (2022)0.00
- NEST: Self-supervised Fast Conformer As All-purpose Seasoning To Speech Processing Tasks (2024)2.26
- Enhancing Neural Spoken Language Recognition: An Exploration With Multilingual Datasets (2025)0.00