Spoken Language Identification System For English-mandarin Code-switching Child-directed Speech
2023 Β· Shashi Kant Gupta, Sushant Hiray, Prashant Kukde
Abstract
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.
Authors
(none)
Tags
Stats
Related papers
- Merlion CCS Challenge: A English-mandarin Code-switching Child-directed Speech Corpus For Language Identification And Diarization (2023)0.00
- Joint Language Identification Of Code-switching Speech Using Attention Based E2E Network (2019)5.24
- Rnn-transducer With Language Bias For End-to-end Mandarin-english Code-switching Speech Recognition (2020)8.09
- On The End-to-end Solution To Mandarin-english Code-switching Speech Recognition (2018)12.10
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24
- Meralion-speechencoder: Towards A Speech Foundation Model For Singapore And Beyond (2024)0.00
- Leveraging Language ID To Calculate Intermediate CTC Loss For Enhanced Code-switching Speech Recognition (2023)0.00