Robust Speaker Clustering Using Mixtures Of Von Mises-fisher Distributions For Naturalistic Audio Streams
2018 Β· Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen
Abstract
Speaker Diarization (i.e. determining who spoke and when?) for multi-speaker naturalistic interactions such as Peer-Led Team Learning (PLTL) sessions is a challenging task. In this study, we propose robust speaker clustering based on mixture of multivariate von Mises-Fisher distributions. Our diarization pipeline has two stages: (i) ground-truth segmentation; (ii) proposed speaker clustering. The ground-truth speech activity information is used for extracting i-Vectors from each speechsegment. We post-process the i-Vectors with principal component analysis for dimension reduction followed by lengthnormalization. Normalized i-Vectors are high-dimensional unit vectors possessing discriminative directional characteristics. We model the normalized i-Vectors with a mixture model consisting of multivariate von Mises-Fisher distributions. K-means clustering with cosine distance is chosen as baseline approach. The evaluation data is derived from: (i) CRSS-PLTL corpus; and (ii) three-meetings s
Authors
(none)
Tags
Stats
Related papers
- Toeplitz Inverse Covariance Based Robust Speaker Clustering For Naturalistic Audio Streams (2019)0.00
- Target-speaker Voice Activity Detection: A Novel Approach For Multi-speaker Diarization In A Dinner Party Scenario (2020)16.19
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- Hypothesis Clustering And Merging: Novel Multitalker Speech Recognition With Speaker Tokens (2024)0.00
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Simultaneous Diarization And Separation Of Meetings Through The Integration Of Statistical Mixture Models (2024)0.00
- Multimodal Clustering With Role Induced Constraints For Speaker Diarization (2022)0.00