Unibuckernel Reloaded: First Place In Arabic Dialect Identification For The Second Year In A Row
2018 Β· Andrei M. Butnaru, Radu Tudor Ionescu
Abstract
We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech or phonetic transcripts, we also use a kernel based on dialectal embeddings generated from audio recordings by the organizers. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Preliminary experiments indicate that KRR provides better classification results. Our approach is shallow and simple, but the empirical results obtained in the 2018 ADI Closed Shared Task prove that it achieves the best performance. Furthermore, our top macro-F1 score (58.92%) is significantly better than the second best score (57.59%) in the 2018 ADI Shared Task, according to the statistical sign
Authors
(none)
Tags
Stats
Related papers
- UTD-CRSS Submission For MGB-3 Arabic Dialect Identification: Front-end And Back-end Advancements On Broadcast Speech (2017)4.52
- MIT-QCRI Arabic Dialect Identification System For The 2017 Multi-genre Broadcast Challenge (2017)8.60
- Can String Kernels Pass The Test Of Time In Native Language Identification? (2017)6.77
- Hybrid Deep Learning And Signal Processing For Arabic Dialect Recognition In Low-resource Settings (2025)0.00
- Classifier Ensembles For Dialect And Language Variety Identification (2018)0.00
- A Deep Learning Approach For Similar Languages, Varieties And Dialects (2019)0.00
- Robust Acoustic Domain Identification With Its Application To Speaker Diarization (2022)2.26
- LSTM-TDNN With Convolutional Front-end For Dialect Identification In The 2019 Multi-genre Broadcast Challenge (2019)0.00