Tight Integration Of Neural- And Clustering-based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model
2022 Β· Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata
Abstract
Speaker diarization has been investigated extensively as an important central task for meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and clustering-based diarization is a promising approach to handle realistic conversational data containing overlapped speech with an arbitrarily large number of speakers, and achieved state-of-the-art results on various tasks. However, the approaches proposed so far have not realized \{\it tight\} integration yet, because the clustering employed therein was not optimal in any sense for clustering the speaker embeddings estimated by the EEND module. To address this problem, this paper introduces a \{\it trainable\} clustering algorithm into the integration framework, by deep-unfolding a non-parametric Bayesian model called the infinite Gaussian mixture model (iGMM). Specifically, the speaker embeddings are optimized during training such that it better fits iGMM clustering, based on a novel clustering loss based on Adjus
Authors
(none)
Tags
Stats
Related papers
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Integrating End-to-end Neural And Clustering-based Diarization: Getting The Best Of Both Worlds (2020)13.74
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- End-to-end Supervised Hierarchical Graph Clustering For Speaker Diarization (2024)5.24
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Reformulating Speaker Diarization As Community Detection With Emphasis On Topological Structure (2022)5.84