Toeplitz Inverse Covariance Based Robust Speaker Clustering For Naturalistic Audio Streams
2019 Β· Harishchandra Dubey, Abhijeet Sangwan, John Hansen
Abstract
Speaker diarization determines who spoke and when? in an audio stream. In this study, we propose a model-based approach for robust speaker clustering using i-vectors. The ivectors extracted from different segments of same speaker are correlated. We model this correlation with a Markov Random Field (MRF) network. Leveraging the advancements in MRF modeling, we used Toeplitz Inverse Covariance (TIC) matrix to represent the MRF correlation network for each speaker. This approaches captures the sequential structure of i-vectors (or equivalent speaker turns) belonging to same speaker in an audio stream. A variant of standard Expectation Maximization (EM) algorithm is adopted for deriving closed-form solution using dynamic programming (DP) and the alternating direction method of multiplier (ADMM). Our diarization system has four steps: (1) ground-truth segmentation; (2) i-vector extraction; (3) post-processing (mean subtraction, principal component analysis, and length-normalization) ; and (
Authors
(none)
Tags
Stats
Related papers
- Robust Speaker Clustering Using Mixtures Of Von Mises-fisher Distributions For Naturalistic Audio Streams (2018)4.52
- A Robust Speaker Clustering Method Based On Discrete Tied Variational Autoencoder (2020)0.00
- Target-speaker Voice Activity Detection: A Novel Approach For Multi-speaker Diarization In A Dinner Party Scenario (2020)16.19
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization (2021)8.35
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Target-speaker Voice Activity Detection With Improved I-vector Estimation For Unknown Number Of Speaker (2021)10.97
- Multi-class Spectral Clustering With Overlaps For Speaker Diarization (2020)10.35