Multi-channel End-to-end Neural Diarization With Distributed Microphones
2021 Β· Shota Horiguchi, Yuki Takashima, Paola Garcia, et al.
Abstract
Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adaptation method using only single-channel recordings. With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input. We also showed that the proposed method performed well even when spatial information is inoperative given multi-channel inputs, such as in hybrid meetings in which the utterances of multiple remote participants
Authors
(none)
Tags
Stats
Related papers
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings (2023)0.00
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization For A Variable Number Of Speakers (2020)10.74
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00