Speaker Diarization For Low-resource Languages Through Wav2vec Fine-tuning
2025 Β· Abdulhady Abas Abdullah, Sarkhel H. Taher Karim, Sara Azad Ahmed, et al.
Abstract
Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish pose unique challenges due to limited annotated data, multiple dialects and frequent code-switching. In this study, we address these issues by training the Wav2Vec 2.0 self-supervised learning model on a dedicated Kurdish corpus. By leveraging transfer learning, we adapted multilingual representations learned from other languages to capture the phonetic and acoustic characteristics of Kurdish speech. Relative to a baseline method, our approach reduced the diarization error rate by seven point two percent and improved cluster purity by thirteen percent. These findings demonstrate that enhancements to existing models can significantly improve diarization performance for under-resourced languages. Our work has practical implications for developing trans
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization As A Fully Online Learning Problem In Minivox (2020)0.00
- Joint Training Or Not: An Exploration Of Pre-trained Speech Models In Audio-visual Speaker Diarization (2023)0.00
- Speaker Diarization With LSTM (2017)17.48
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Central Kurdish Text-to-speech Synthesis With Novel End-to-end Transformer Training (2024)0.00