Improving End-to-end Neural Diarization Using Conversational Summary Representations
2023 Β· Samuel J. Broughton, Lahiru Samarakoon
Abstract
Speaker diarization is a task concerned with partitioning an audio recording by speaker identity. End-to-end neural diarization with encoder-decoder based attractor calculation (EEND-EDA) aims to solve this problem by directly outputting diarization results for a flexible number of speakers. Currently, the EDA module responsible for generating speaker-wise attractors is conditioned on zero vectors providing no relevant information to the network. In this work, we extend EEND-EDA by replacing the input zero vectors to the decoder with learned conversational summary representations. The updated EDA module sequentially generates speaker-wise attractors based on utterance-level information. We propose three methods to initialize the summary vector and conduct an investigation into varying input recording lengths. On a range of publicly available test sets, our model achieves an absolute DER performance improvement of 1.90 % when compared to the baseline.
Authors
(none)
Tags
Stats
Related papers
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- Do End-to-end Neural Diarization Attractors Need To Encode Speaker Characteristic Information? (2024)2.26
- End-to-end Neural Diarization: From Transformer To Conformer (2021)10.85