Exploring Speech Foundation Models For Speaker Diarization In Child-adult Dyadic Interactions
2024 Β· Anfeng Xu, Kevin Huang, Tiantian Feng, et al.
Abstract
Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker Confusion Rate, respectively, compared to previous speaker diarization methods. In addition, we benchmark and evaluate the speaker diarization results of the speech foundation models with varying the input audio window size, speaker demographics, and training data ratio. Our results highlight promising pathways for understanding and adopting speech foundation models to facilitate child speech understanding.
Authors
(none)
Tags
Stats
Related papers
- Data Efficient Child-adult Speaker Diarization With Simulated Conversations (2024)0.00
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- A Comparison Study On Infant-parent Voice Diarization (2020)0.00
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21
- Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR (2024)3.58
- Llm-based Speaker Diarization Correction: A Generalizable Approach (2024)7.16
- Investigating The Effects Of Large-scale Pseudo-stereo Data And Different Speech Foundation Model On Dialogue Generative Spoken Language Model (2024)0.00
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00