Meta-learning With Latent Space Clustering In Generative Adversarial Network For Speaker Diarization
2020 Β· Monisankha Pal, Manoj Kumar, Raghuveer Peri, et al.
Abstract
The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine-tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to the autism spectrum disorder domain. Extensive analyses of the experimental data are done to investig
Authors
(none)
Tags
Stats
Related papers
- A Study Of Semi-supervised Speaker Diarization System Using Gan Mixture Model (2019)0.00
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Discriminative Neural Clustering For Speaker Diarisation (2019)10.07
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- End-to-end Supervised Hierarchical Graph Clustering For Speaker Diarization (2024)5.24
- Generative Adversarial Speaker Embedding Networks For Domain Robust End-to-end Speaker Verification (2018)0.00
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58