End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification
2020 Β· Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, et al.
Abstract
The most common approach to speaker diarization is clustering of speaker embeddings. However, the clustering-based approach has a number of problems; i.e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps. To solve these problems, we propose the End-to-End Neural Diarization (EEND), in which a neural network directly outputs speaker diarization results given a multi-speaker recording. To realize such an end-to-end model, we formulate the speaker diarization problem as a multi-label classification problem and introduce a permutation-free objective function to directly minimize diarization errors. Besides its end-to-end simplicity, the EEND method can explicitly handle speaker overlaps during training and inference. Just by feeding multi-speaker recordings with corresponding speaker segment labels, our model can
Authors
(none)
Tags
Stats
Related papers
- End-to-end Neural Speaker Diarization With Permutation-free Objectives (2019)21.98
- Integrating End-to-end Neural And Clustering-based Diarization: Getting The Best Of Both Worlds (2020)13.74
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings (2023)0.00
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48