Highly Efficient Real-time Streaming And Fully On-device Speaker Diarization With Multi-stage Clustering
2022 Β· Quan Wang, Yiling Huang, Han Lu, et al.
Abstract
While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems. In this paper, we demonstrate that a multi-stage clustering strategy that uses different clustering algorithms for input of different lengths can address multi-faceted challenges of on-device speaker diarization applications. Specifically, a fallback clusterer is used to handle short-form inputs; a main clusterer is used to handle medium-length inputs; and a pre-clusterer is used to compress long-form inputs before they are processed by the main clusterer. Both the main clusterer and the pre-clusterer can be configured with an upper bound of the computational complexity to adapt to devices with different resource constraints. This multi-stage clustering strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Overlap-aware Low-latency Online Speaker Diarization Based On End-to-end Local Segmentation (2021)10.35
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Interrelate Training And Searching: A Unified Online Clustering Framework For Speaker Diarization (2022)6.77
- Low-latency Online Speaker Diarization With Graph-based Label Generation (2021)8.60
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24