Spoken Language Identification Using Convnets
2019 Β· Sarthak, Shikhar Shukla, Govind Mittal
Abstract
Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the
Authors
(none)
Tags
Stats
Related papers
- Is Attention Always Needed? A Case Study On Language Identification From Speech (2021)2.26
- Multi-language Identification Using Convolutional Recurrent Neural Network (2016)13.88
- End-to-end Language Identification Using Multi-head Self-attention And 1D Convolutional Neural Networks (2021)0.00
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Joint Language Identification Of Code-switching Speech Using Attention Based E2E Network (2019)5.24
- Phonetic Temporal Neural Model For Language Identification (2017)12.40
- Accidental Learners: Spoken Language Identification In Multilingual Self-supervised Models (2022)5.84