Enhancing Neural Spoken Language Recognition: An Exploration With Multilingual Datasets
2025 Β· Or Haim Anidjar, Roi Yozevitch
Abstract
In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized pooling layer. We utilized a broad dataset range from Common-Voice, targeting ten languages across Indo-European, Semitic, and East Asian families. The major innovation involved optimizing the architecture of Time Delay Neural Networks. We introduced additional layers and restructured these networks into a funnel shape, enhancing their ability to process complex linguistic patterns. A rigorous grid search determined the optimal settings for these networks, significantly boosting their efficiency in language pattern recognition from audio samples. The model underwent extensive training, including a phase with augmented data, to refine its capabilities. The culmination of these efforts is a highly accurate system, achieving a 97% accuracy rate in langu
Authors
(none)
Tags
Stats
Related papers
- Massively Multilingual Adversarial Speech Recognition (2019)11.93
- A Network Of Deep Neural Networks For Distant Speech Recognition (2017)10.35
- Neuralmultiling: A Novel Neural Architecture Search For Smartphone Based Multilingual Speaker Verification (2024)0.00
- On The Effectiveness Of Neural Text Generation Based Data Augmentation For Recognition Of Morphologically Rich Speech (2020)0.00
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019)14.97
- Combining Speakers Of Multiple Languages To Improve Quality Of Neural Voices (2021)5.24
- Multi-staged Cross-lingual Acoustic Model Adaption For Robust Speech Recognition In Real-world Applications -- A Case Study On German Oral History Interviews (2020)0.00