Online End-to-end Neural Diarization With Speaker-tracing Buffer
2020 Β· Yawen Xue, Shota Horiguchi, Yusuke Fujita, et al.
Abstract
This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker regions incorrectly across the recording. To circumvent this inconsistency, we proposed a speaker-tracing buffer mechanism that selects several input frames representing the speaker permutation information from previous chunks and stores them in a buffer. These buffered frames are stacked with the input frames in the current chunk and fed into a self-attention network. Our method ensures consistent diarization outputs across the buffer and the current chunk by checking the correlation between their corresponding outputs. Additionally, we trained SA-EEND with variable chunk-sizes to mitigate the mismatch between training and inference introduced by the speaker-tracing buffer mechanism. Experimental results, including online SA-EEND and variable chunk
Authors
(none)
Tags
Stats
Related papers
- Online Streaming End-to-end Neural Diarization Handling Overlapping Speech And Flexible Numbers Of Speakers (2021)0.00
- Frame-wise Streaming End-to-end Speaker Diarization With Non-autoregressive Self-attention-based Attractors (2023)2.26
- LS-EEND: Long-form Streaming End-to-end Neural Diarization With Online Attractor Extraction (2024)3.58
- BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization For A Variable Number Of Speakers (2020)10.74
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Online Neural Diarization Of Unlimited Numbers Of Speakers Using Global And Local Attractors (2022)10.07