Convolutional Neural Networks And Language Embeddings For End-to-end Dialect Recognition
2018 Β· Suwon Shon, Ahmed Ali, James Glass
Abstract
Dialect identification (DID) is a special case of general language identification (LID), but a more challenging problem due to the linguistic similarity between dialects. In this paper, we propose an end-to-end DID system and a Siamese neural network to extract language embeddings. We use both acoustic and linguistic features for the DID task on the Arabic dialectal speech dataset: Multi-Genre Broadcast 3 (MGB-3). The end-to-end DID system was trained using three kinds of acoustic features: Mel-Frequency Cepstral Coefficients (MFCCs), log Mel-scale Filter Bank energies (FBANK) and spectrogram energies. We also investigated a dataset augmentation approach to achieve robust performance with limited data resources. Our linguistic feature research focused on learning similarities and dissimilarities between dialects using the Siamese network, so that we can reduce feature dimensionality as well as improve DID performance. The best system using a single feature set achieves 73% accuracy, wh
Authors
(none)
Tags
Stats
Related papers
- LSTM-TDNN With Convolutional Front-end For Dialect Identification In The 2019 Multi-genre Broadcast Challenge (2019)0.00
- Transformer-based Arabic Dialect Identification (2020)9.03
- Advanced Accent/dialect Identification And Accentedness Assessment With Multi-embedding Models And Automatic Speech Recognition (2023)7.16
- Leveraging Native Language Speech For Accent Identification Using Deep Siamese Networks (2017)7.50
- Hybrid Deep Learning And Signal Processing For Arabic Dialect Recognition In Low-resource Settings (2025)0.00
- A Deep Learning Approach For Similar Languages, Varieties And Dialects (2019)0.00
- MIT-QCRI Arabic Dialect Identification System For The 2017 Multi-genre Broadcast Challenge (2017)8.60
- Domain Attentive Fusion For End-to-end Dialect Identification With Unknown Target Domain (2018)0.00