End-to-end Speaker Diarization Conditioned On Speech Activity And Overlap Detection
2021 Β· Yuki Takashima, Yusuke Fujita, Shinji Watanabe, et al.
Abstract
In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
Authors
(none)
Tags
Stats
Related papers
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings (2023)0.00
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21