Enhancements For Audio-only Diarization Systems
2019 Β· Dimitrios Dimitriadis
Abstract
In this paper two different approaches to enhance the performance of the most challenging component of a Speaker Diarization system are presented, i.e. the speaker clustering part. A processing step is proposed enhancing the input features with a temporal smoothing process combined with nonlinear filtering. We, also, propose improvements on the Deep Embedded Clustering (DEC) algorithm -- a nonlinear feature transformation. The performance of these enhancements is compared with different clustering algorithms, such as the UISRNN, k-Means, Spectral clustering and x-Means. The evaluation is held on three different tasks, i.e. the AMI, DIHARD and an internal meeting transcription task. The proposed approaches assume a known number of speakers and time segmentations for the audio files. Since, we focus only on the clustering component of diarization for this work, the segmentation provided is assumed perfect. Finally, we present how supervision, in the form of given speaker profiles, can fu
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- Highly Efficient Real-time Streaming And Fully On-device Speaker Diarization With Multi-stage Clustering (2022)0.00
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Reformulating Speaker Diarization As Community Detection With Emphasis On Topological Structure (2022)5.84
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24