End-to-end Neural Diarization: From Transformer To Conformer
2021 Β· Yi Chieh Liu, Eunjung Han, Chul Lee, et al.
Abstract
We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech. We first show that data augmentation and convolutional subsampling layers enhance the original self-attentive EEND in the Transformer-based EEND, and then Conformer gives an additional gain over the Transformer-based EEND. However, we notice that the Conformer-based EEND does not generalize as well from simulated to real conversation data as the Transformer-based model. This leads us to quantify the mismatch between simulated data and real speaker behavior in terms of temporal statistics reflecting turn-taking between speakers, and investigate its correlation with diarization error. By mixing simulated and real data in EEND training, we mitigate the mismatch further, with Conformer-based EEND achieving 24% error reduction over the baseline SA-EEND syste
Authors
(none)
Tags
Stats
Related papers
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Speech-aware Neural Diarization With Encoder-decoder Attractor Guided By Attention Constraints (2024)0.00
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21
- Auxiliary Loss Of Transformer With Residual Connection For End-to-end Speaker Diarization (2021)8.60
- Towards Word-level End-to-end Neural Speaker Diarization With Auxiliary Network (2023)0.00
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Encoder-decoder Based Attractors For End-to-end Neural Diarization (2021)13.05