Diaper: End-to-end Neural Diarization With Perceiver-based Attractors
2023 Β· Federico Landini, Mireia Diez, Themos Stafylakis, et al.
Abstract
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
Authors
(none)
Tags
Stats
Related papers
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization For A Variable Number Of Speakers (2020)10.74
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00