End-to-end Neural Speaker Diarization With Permutation-free Objectives
2019 Β· Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, et al.
Abstract
In this paper, we propose a novel end-to-end neural-network-based speaker diarization method. Unlike most existing methods, our proposed method does not have separate modules for extraction and clustering of speaker representations. Instead, our model has a single neural network that directly outputs speaker diarization results. To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem. Besides its end-to-end simplicity, the proposed method also benefits from being able to explicitly handle overlapping speech during training and inference. Because of the benefit, our model can be easily trained/adapted with real-recorded multi-speaker conversations just by feeding the corresponding multi-speaker segment labels. We evaluated the proposed method on simulated speech mixtures. The
Authors
(none)
Tags
Stats
Related papers
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- End-to-end Diarization For Variable Number Of Speakers With Local-global Networks And Discriminative Speaker Embeddings (2021)0.00
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings (2023)0.00
- Online End-to-end Neural Diarization With Speaker-tracing Buffer (2020)10.74