Data Efficient Child-adult Speaker Diarization With Simulated Conversations
2024 Β· Anfeng Xu, Tiantian Feng, Helen Tager-Flusberg, et al.
Abstract
Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies ``who spoke when'', is an essential component of the automated analysis. However, publicly available child-adult speaker diarization solutions are scarce due to privacy concerns and a lack of annotated datasets, while manually annotating data for each scenario is both time-consuming and costly. To overcome these challenges, we propose a data-efficient solution by creating simulated child-adult conversations using AudioSet. We then train a Whisper Encoder-based model, achieving strong zero-shot performance on child-adult speaker diarization using real datasets. The model performance improves substantially when fine-tuned with only 30 minutes of real train data, with LoRA further improving the transfer learning performance. The source code and the child-adult speaker diarization model trained on simulated conversations are publicly available.
Authors
(none)
Tags
Stats
Related papers
- Multi-speaker And Wide-band Simulated Conversations As Training Data For End-to-end Neural Diarization (2022)8.60
- Exploring Speech Foundation Models For Speaker Diarization In Child-adult Dyadic Interactions (2024)5.24
- Spot The Conversation: Speaker Diarisation In The Wild (2020)15.31
- Improving The Naturalness Of Simulated Conversations For End-to-end Neural Diarization (2022)9.59
- Speaker Diarization As A Fully Online Learning Problem In Minivox (2020)0.00
- Towards Unsupervised Speaker Diarization System For Multilingual Telephone Calls Using Pre-trained Whisper Model And Mixture Of Sparse Autoencoders (2024)2.26
- Property-aware Multi-speaker Data Simulation: A Probabilistic Modelling Technique For Synthetic Data Generation (2023)6.34
- Dicow: Diarization-conditioned Whisper For Target Speaker Automatic Speech Recognition (2024)8.09