Cross-corpora Language Recognition: A Preliminary Investigation With Indian Languages
2021 Β· Spandan Dey, Goutam Saha, Md Sahidullah
Abstract
In this paper, we conduct one of the very first studies for cross-corpora performance evaluation in the spoken language identification (LID) problem. Cross-corpora evaluation was not explored much in LID research, especially for the Indian languages. We have selected three Indian spoken language corpora: IIITH-ILSC, LDC South Asian, and IITKGP-MLILSC. For each of the corpus, LID systems are trained on the state-of-the-art time-delay neural network (TDNN) based architecture with MFCC features. We observe that the LID performance degrades drastically for cross-corpora evaluation. For example, the system trained on the IIITH-ILSC corpus shows an average EER of 11.80 % and 43.34 % when evaluated with the same corpora and LDC South Asian corpora, respectively. Our preliminary analysis shows the significant differences among these corpora in terms of mismatch in the long-term average spectrum (LTAS) and signal-to-noise ratio (SNR). Subsequently, we apply different feature level compensation
Authors
(none)
Tags
Stats
Related papers
- Cross-domain Adaptation Of Spoken Language Identification For Related Languages: The Curious Case Of Slavic Languages (2020)8.35
- Joint Language Identification Of Code-switching Speech Using Attention Based E2E Network (2019)5.24
- Enhancing Neural Spoken Language Recognition: An Exploration With Multilingual Datasets (2025)0.00
- Identification/segmentation Of Indian Regional Languages With Singular Value Decomposition Based Feature Embedding (2020)6.77
- CLSRIL-23: Cross Lingual Speech Representations For Indic Languages (2021)0.00
- Investigating The Impact Of Cross-lingual Acoustic-phonetic Similarities On Multilingual Speech Recognition (2022)3.58
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Indicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTS (2024)2.26