Transformer Attractors For Robust And Efficient End-to-end Neural Diarization
2023 · Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, et al.
Abstract
End-to-end neural diarization with encoder-decoder based attractors (EEND-EDA) is a method to perform diarization in a single neural network. EDA handles the diarization of a flexible number of speakers by using an LSTM-based encoder-decoder that generates a set of speaker-wise attractors in an autoregressive manner. In this paper, we propose to replace EDA with a transformer-based attractor calculation (TA) module. TA is composed of a Combiner block and a Transformer decoder. The main function of the combiner block is to generate conversational dependent (CD) embeddings by incorporating learned conversational information into a global set of embeddings. These CD embeddings will then serve as the input for the transformer decoder. Results on public datasets show that EEND-TA achieves 2.68% absolute DER improvement over EEND-EDA. EEND-TA inference is 1.28 times faster than that of EEND-EDA.
Authors
(none)
Tags
Stats
Related papers
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Do End-to-end Neural Diarization Attractors Need To Encode Speaker Characteristic Information? (2024)2.26
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- End-to-end Neural Diarization: From Transformer To Conformer (2021)10.85
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- Auxiliary Loss Of Transformer With Residual Connection For End-to-end Speaker Diarization (2021)8.60
- Target Speaker Voice Activity Detection With Transformers And Its Integration With End-to-end Neural Diarization (2022)10.48