Integrating End-to-end Neural And Clustering-based Diarization: Getting The Best Of Both Worlds
2020 Β· Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara
Abstract
Recent diarization technologies can be categorized into two approaches, i.e., clustering and end-to-end neural approaches, which have different pros and cons. The clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. While it can be seen as a current state-of-the-art approach that works for various challenging data with reasonable robustness and accuracy, it has a critical disadvantage that it cannot handle overlapped speech that is inevitable in natural conversational data. In contrast, the end-to-end neural diarization (EEND), which directly predicts diarization labels using a neural network, was devised to handle the overlapped speech. While the EEND, which can easily incorporate emerging deep-learning technologies, has started outperforming the x-vector clustering approach in some realistic database, it is difficult to make it work for `long' recordings (e.g., recordings longer than 10 minutes) because of, e.g., its
Authors
(none)
Tags
Stats
Related papers
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Tight Integration Of Neural- And Clustering-based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model (2022)8.60
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21
- An Experimental Review Of Speaker Diarization Methods With Application To Two-speaker Conversational Telephone Speech Recordings (2023)8.35
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00