Examining Test-time Adaptation For Personalized Child Speech Recognition
2024 Β· Zhonghao Shi, Xuan Shi, Anfeng Xu, et al.
Abstract
Automatic speech recognition (ASR) models often experience performance degradation due to data domain shifts introduced at test time, a challenge that is further amplified for child speakers. Test-time adaptation (TTA) methods have shown great potential in bridging this domain gap. However, the use of TTA to adapt ASR models to the individual differences in each child's speech has not yet been systematically studied. In this work, we investigate the effectiveness of two widely used TTA methods-SUTA, SGEM-in adapting off-the-shelf ASR models and their fine-tuned versions for child speech recognition, with the goal of enabling continuous, unsupervised adaptation at test time. Our findings show that TTA significantly improves the performance of both off-the-shelf and fine-tuned ASR models, both on average and across individual child speakers, compared to unadapted baselines. However, while TTA helps adapt to individual variability, it may still be limited with non-linguistic child speech.
Authors
(none)
Tags
Stats
Related papers
- Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation For Automatic Speech Recognition (2022)8.09
- Continual Test-time Adaptation For End-to-end Speech Recognition On Noisy Speech (2024)4.52
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- SGEM: Test-time Adaptation For Automatic Speech Recognition Via Sequential-level Generalized Entropy Minimization (2023)6.77
- Advancing Test-time Adaptation In Wild Acoustic Test Settings (2023)2.26
- SUTA-LM: Bridging Test-time Adaptation And Language Model Rescoring For Robust ASR (2025)0.00
- EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition (2025)0.00