EEND-DEMUX: End-to-end Neural Speaker Diarization Via Demultiplexed Speaker Embeddings
2023 Β· Sung Hwan Mun, Min Hyun Han, Canyeong Moon, et al.
Abstract
In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity. EEND-DEMUX can directly obtain separated speaker embeddings through the demultiplexing operation in the inference phase without an external speaker diarization system, an embedding extractor, or a heuristic decoding technique. Furthermore, we employ a multi-head cross-attention mechanism to capture the correlation between mixture and separated speaker embeddings effectively. We formulate three loss functions based on matching, orthogonality, and sparsity constraints to learn robust demultiplexed speaker embeddings. The experimental results on the LibriMix dataset show consistently improved p
Authors
(none)
Tags
Stats
Related papers
- EEND-SS: Joint End-to-end Neural Speaker Diarization And Speech Separation For Flexible Number Of Speakers (2022)10.35
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05