Encoder-decoder Based Attractors For End-to-end Neural Diarization
2021 Β· Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, et al.
Abstract
This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based attractor calculation module (EDA) to EEND. Once frame-wise embeddings are obtained, EDA sequentially generates speaker-wise attractors on the basis of a sequence-to-sequence method using an LSTM encoder-decoder. The attractor generation continues until a stopping condition is satisfied; thus, the number of attractors can be flexible. Diarization results are then estimated as dot products of the attractors and embeddings. The embeddings from speaker overlaps result in larger dot product values with multiple attractors; thus, this method can deal with speaker overlaps. Because the maximum number
Authors
(none)
Tags
Stats
Related papers
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Do End-to-end Neural Diarization Attractors Need To Encode Speaker Characteristic Information? (2024)2.26
- LS-EEND: Long-form Streaming End-to-end Neural Diarization With Online Attractor Extraction (2024)3.58
- Online Neural Diarization Of Unlimited Numbers Of Speakers Using Global And Local Attractors (2022)10.07
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Frame-wise Streaming End-to-end Speaker Diarization With Non-autoregressive Self-attention-based Attractors (2023)2.26
- Towards Neural Diarization For Unlimited Numbers Of Speakers Using Global And Local Attractors (2021)11.29