Multi-speaker And Wide-band Simulated Conversations As Training Data For End-to-end Neural Diarization
2022 Β· Federico Landini, Mireia Diez, Alicia Lozano-Diez, et al.
Abstract
End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once. Many flavors of end-to-end models have been proposed but all of them require (so far non-existing) large amounts of annotated data for training. The compromise solution consists in generating synthetic data and the recently proposed simulated conversations (SC) have shown remarkable improvements over the original simulated mixtures (SM). In this work, we create SC with multiple speakers per conversation and show that they allow for substantially better performance than SM, also reducing the dependence on a fine-tuning stage. We also create SC with wide-band public audio sources and present an analysis on several evaluation sets. Together with this publication, we release the recipes for generating such data and models trained on public sets as well as the implementation to efficiently handle multiple speakers per conversa
Authors
(none)
Tags
Stats
Related papers
- Improving The Naturalness Of Simulated Conversations For End-to-end Neural Diarization (2022)9.59
- Speaker Conditioned Acoustic Modeling For Multi-speaker Conversational ASR (2021)4.52
- Data Efficient Child-adult Speaker Diarization With Simulated Conversations (2024)0.00
- An Experimental Review Of Speaker Diarization Methods With Application To Two-speaker Conversational Telephone Speech Recordings (2023)8.35
- Improving End-to-end Neural Diarization Using Conversational Summary Representations (2023)0.00
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Property-aware Multi-speaker Data Simulation: A Probabilistic Modelling Technique For Synthetic Data Generation (2023)6.34