BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization For A Variable Number Of Speakers
2020 Β· Eunjung Han, Chul Lee, Andreas Stolcke
Abstract
We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the Encoder-Decoder-Attractor (EDA) architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants: For unlimited-latency BW-EDA-EEND, which processes inputs in linear time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but the algorithm still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-E
Authors
(none)
Tags
Stats
Related papers
- LS-EEND: Long-form Streaming End-to-end Neural Diarization With Online Attractor Extraction (2024)3.58
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- Online Streaming End-to-end Neural Diarization Handling Overlapping Speech And Flexible Numbers Of Speakers (2021)0.00
- Online End-to-end Neural Diarization With Speaker-tracing Buffer (2020)10.74
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21