Designing An Effective Metric Learning Pipeline For Speaker Diarization
2018 Β· Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Huan Song, et al.
Abstract
State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained \(i-\)vectors to representations learned via different sequence modeling architectures (e.g. 1D-CNNs, LSTMs, attention models), while adopting off-the-shelf metric learning solutions. In this paper, we argue that, regardless of the feature extractor, it is crucial to carefully design a metric learning pipeline, namely the loss function, the sampling strategy and the discrimnative margin parameter, for building robust diarization systems. Furthermore, we propose to adopt a fine-grained validation process to obtain a comprehensive evaluation of the generalization power of metric learning pipelines. To this end, we measure diarization performance across different language speakers, and
Authors
(none)
Tags
Stats
Related papers
- Triplet Network With Attention For Speaker Diarization (2018)7.16
- Improved Large-margin Softmax Loss For Speaker Diarisation (2019)6.34
- Aligning Speakers: Evaluating And Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (extended Version) (2023)0.00
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26
- An Approach To Optimize Inference Of The DIART Speaker Diarization Pipeline (2024)0.00
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Novel Architectures For Unsupervised Information Bottleneck Based Speaker Diarization Of Meetings (2020)8.09
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21