Do End-to-end Neural Diarization Attractors Need To Encode Speaker Characteristic Information?
2024 Β· Lin Zhang, Themos Stafylakis, Federico Landini, et al.
Abstract
In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what information is essential for the model. EEND-EDA utilizes attractors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to contain speaker characteristic information. On the other hand, giving the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite architectural differences in EEND systems, the notion of attractors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclusions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.
Authors
(none)
Tags
Stats
Related papers
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- LS-EEND: Long-form Streaming End-to-end Neural Diarization With Online Attractor Extraction (2024)3.58
- Frame-wise Streaming End-to-end Speaker Diarization With Non-autoregressive Self-attention-based Attractors (2023)2.26
- Online Neural Diarization Of Unlimited Numbers Of Speakers Using Global And Local Attractors (2022)10.07