MIT-QCRI Arabic Dialect Identification System For The 2017 Multi-genre Broadcast Challenge
2017 Β· Suwon Shon, Ahmed Ali, James Glass
Abstract
In order to successfully annotate the Arabic speech con- tent found in open-domain media broadcasts, it is essential to be able to process a diverse set of Arabic dialects. For the 2017 Multi-Genre Broadcast challenge (MGB-3) there were two possible tasks: Arabic speech recognition, and Arabic Dialect Identification (ADI). In this paper, we describe our efforts to create an ADI system for the MGB-3 challenge, with the goal of distinguishing amongst four major Arabic dialects, as well as Modern Standard Arabic. Our research fo- cused on dialect variability and domain mismatches between the training and test domain. In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches. Both Acoustic and linguistic features were used for the final MGB-3 submissions, with the best primary system achieving 75% accuracy on the official 10hr test set.
Authors
(none)
Tags
Stats
Related papers
- UTD-CRSS Submission For MGB-3 Arabic Dialect Identification: Front-end And Back-end Advancements On Broadcast Speech (2017)4.52
- The MGB-2 Challenge: Arabic Multi-dialect Broadcast Media Recognition (2016)11.76
- LSTM-TDNN With Convolutional Front-end For Dialect Identification In The 2019 Multi-genre Broadcast Challenge (2019)0.00
- Dialectal Coverage And Generalization In Arabic Speech Recognition (2024)4.52
- Convolutional Neural Networks And Language Embeddings For End-to-end Dialect Recognition (2018)12.40
- Hybrid Deep Learning And Signal Processing For Arabic Dialect Recognition In Low-resource Settings (2025)0.00
- Classifier Ensembles For Dialect And Language Variety Identification (2018)0.00
- Multi-view Dimensionality Reduction For Dialect Identification Of Arabic Broadcast Speech (2016)0.00