DIHARD II Is Still Hard: Experimental Results And Discussions From The DKU-LENOVO Team
2020 Β· Qingjian Lin, Weicheng Cai, Lin Yang, et al.
Abstract
In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.
Authors
(none)
Tags
Stats
Related papers
- The Dku-duke-lenovo System Description For The Third DIHARD Speech Diarization Challenge (2021)0.00
- UWB-NTIS Speaker Diarization System For The DIHARD II 2019 Challenge (2019)4.52
- The Second DIHARD Diarization Challenge: Dataset, Task, And Baselines (2019)15.00
- The Hitachi-jhu DIHARD III System: Competitive End-to-end Neural Diarization And X-vector Clustering Systems Combined By Dover-lap (2021)0.00
- The Speed Submission To DIHARD II: Contributions & Lessons Learned (2019)0.00
- The DKU-MSXF Diarization System For The Voxceleb Speaker Recognition Challenge 2023 (2023)5.24
- The HUAWEI Speaker Diarisation System For The Voxceleb Speaker Diarisation Challenge (2020)0.00
- Enhancements For Audio-only Diarization Systems (2019)0.00