Is Attention Always Needed? A Case Study On Language Identification From Speech
2021 Β· Atanu Mandal, Santanu Pal, Indranil Dutta, et al.
Abstract
Language Identification (LID) is a crucial preliminary process in the field of Automatic Speech Recognition (ASR) that involves the identification of a spoken language from audio samples. Contemporary systems that can process speech in multiple languages require users to expressly designate one or more languages prior to utilization. The LID task assumes a significant role in scenarios where ASR systems are unable to comprehend the spoken language in multilingual settings, leading to unsuccessful speech recognition outcomes. The present study introduces convolutional recurrent neural network (CRNN) based LID, designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples. Furthermore, we replicate certain state-of-the-art methodologies, specifically the Convolutional Neural Network (CNN) and Attention-based Convolutional Recurrent Neural Network (CRNN with attention), and conduct a comparative analysis with our CRNN-based approach. We conducted co
Authors
(none)
Tags
Stats
Related papers
- Joint Language Identification Of Code-switching Speech Using Attention Based E2E Network (2019)5.24
- Spoken Language Identification Using Convnets (2019)9.59
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Multi-language Identification Using Convolutional Recurrent Neural Network (2016)13.88
- Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019)11.67
- Streaming End-to-end Bilingual ASR Systems With Joint Language Identification (2020)0.00
- Joint Unsupervised And Supervised Learning For Context-aware Language Identification (2023)2.26
- End-to-end Language Identification Using Multi-head Self-attention And 1D Convolutional Neural Networks (2021)0.00