A Comparison Study On Infant-parent Voice Diarization
2020 Β· Junzhe Zhu, Mark Hasegawa-Johnson, Nancy McElwain
Abstract
We design a framework for studying prelinguistic child voicefrom 3 to 24 months based on state-of-the-art algorithms in di-arization. Our system consists of a time-invariant feature ex-tractor, a context-dependent embedding generator, and a clas-sifier. We study the effect of swapping out different compo-nents of the system, as well as changing loss function, to findthe best performance. We also present a multiple-instancelearning technique that allows us to pre-train our parame-ters on larger datasets with coarser segment boundary labels.We found that our best system achieved 43.8% DER on testdataset, compared to 55.4% DER achieved by LENA soft-ware. We also found that using convolutional feature extrac-tor instead of logmel features significantly increases the per-formance of neural diarization.
Authors
(none)
Tags
Stats
Related papers
- DIHARD II Is Still Hard: Experimental Results And Discussions From The DKU-LENOVO Team (2020)6.34
- Data Efficient Child-adult Speaker Diarization With Simulated Conversations (2024)0.00
- Exploring Speech Foundation Models For Speaker Diarization In Child-adult Dyadic Interactions (2024)5.24
- Speaker Diarization With LSTM (2017)17.48
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Diaper: End-to-end Neural Diarization With Perceiver-based Attractors (2023)9.59
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Joint Training Or Not: An Exploration Of Pre-trained Speech Models In Audio-visual Speaker Diarization (2023)0.00