Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation For Automatic Speech Recognition
2022 Β· Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
Abstract
Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions. Test-time Adaptation (TTA), previously explored in the computer vision area, aims to adapt the model trained on source domains to yield better predictions for test samples, often out-of-domain, without accessing the source data. Here, we propose the Single-Utterance Test-time Adaptation (SUTA) framework for ASR, which is the first TTA study on ASR to our best knowledge. The single-utterance TTA is a more realistic setting that does not assume test data are sampled from identical distribution and does not delay on-demand inference due to pre-collection for the batch of adaptation data. SUTA consists of unsupervised objectives with an efficient adaptation strategy. Empirical results demonstrate that SUTA effectively improves the performance of the source ASR model eval
Authors
(none)
Tags
Stats
Related papers
- Continual Test-time Adaptation For End-to-end Speech Recognition On Noisy Speech (2024)4.52
- Examining Test-time Adaptation For Personalized Child Speech Recognition (2024)0.00
- SUTA-LM: Bridging Test-time Adaptation And Language Model Rescoring For Robust ASR (2025)0.00
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- Advancing Test-time Adaptation In Wild Acoustic Test Settings (2023)2.26
- A Simple Baseline For Domain Adaptation In End To End ASR Systems Using Synthetic Data (2022)7.16
- E-BATS: Efficient Backpropagation-free Test-time Adaptation For Speech Foundation Models (2025)0.00