DOVER: A Method For Combining Diarization Outputs
2019 Β· Andreas Stolcke, Takuya Yoshioka
Abstract
Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems. Diarization, the task of segmenting an audio stream into speaker-homogeneous and co-indexed regions, has so far not seen the benefit of this strategy because the structure of the task does not lend itself to a simple voting approach. This paper presents DOVER (diarization output voting error reduction), an algorithm for weighted voting among diarization hypotheses, in the spirit of the ROVER algorithm for combining speech recognition hypotheses. We evaluate the algorithm for diarization of meeting recordings with multiple microphones, and find that it consistently reduces diarization error rate over the average of results from individual channels, and often improves on the single best channel chosen by an oracle.
Authors
(none)
Tags
Stats
Related papers
- Improving Diarization Robustness Using Diversification, Randomization And The DOVER Algorithm (2019)0.00
- Dover-lap: A Method For Combining Overlap-aware Diarization Outputs (2020)11.76
- Microsoft Speaker Diarization System For The Voxceleb Speaker Recognition Challenge 2020 (2020)11.93
- Once More Diarization: Improving Meeting Transcription Systems Through Segment-level Speaker Reassignment (2024)5.24
- The Hitachi-jhu DIHARD III System: Competitive End-to-end Neural Diarization And X-vector Clustering Systems Combined By Dover-lap (2021)0.00
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Probabilistic Fusion And Calibration Of Neural Speaker Diarization Models (2025)0.00
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00