Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints
2024 Β· Peiying Lee, Hauyun Guo, Berlin Chen
Abstract
End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility.
Authors
(none)
Tags
Stats
Related papers
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- LS-EEND: Long-form Streaming End-to-end Neural Diarization With Online Attractor Extraction (2024)3.58
- BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization For A Variable Number Of Speakers (2020)10.74
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- Do End-to-end Neural Diarization Attractors Need To Encode Speaker Characteristic Information? (2024)2.26