Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization
2021 Β· Prachi Singh, Sriram Ganapathy
Abstract
Automatic speaker diarization techniques typically involve a two-stage processing approach where audio segments of fixed duration are converted to vector representations in the first stage. This is followed by an unsupervised clustering of the representations in the second stage. In most of the prior approaches, these two stages are performed in an isolated manner with independent optimization steps. In this paper, we propose a representation learning and clustering algorithm that can be iteratively performed for improved speaker diarization. The representation learning is based on principles of self-supervised learning while the clustering algorithm is a graph structural method based on path integral clustering (PIC). The representation learning step uses the cluster targets from PIC and the clustering step is performed on embeddings learned from the self-supervised deep model. This iterative approach is referred to as self-supervised clustering (SSC). The diarization experiments are
Authors
(none)
Tags
Stats
Related papers
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Learning Deep Representations By Multilayer Bootstrap Networks For Speaker Diarization (2019)0.00
- Self-supervised Reflective Learning Through Self-distillation And Online Clustering For Speaker Representation Learning (2024)2.26
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Toeplitz Inverse Covariance Based Robust Speaker Clustering For Naturalistic Audio Streams (2019)0.00
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- A Reinforcement Learning Framework For Online Speaker Diarization (2023)0.00