Continual Test-time Adaptation For End-to-end Speech Recognition On Noisy Speech
2024 Β· Guan-Ting Lin, Wei-Ping Huang, Hung-Yi Lee
Abstract
Deep Learning-based end-to-end Automatic Speech Recognition (ASR) has made significant strides but still struggles with performance on out-of-domain samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we first propose a Fast-slow TTA framework for ASR that leverages the advantage of continual and non-continual TTA. Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA robustness for time-varying data, we design a dynamic reset strategy to automatically detect domain shifts and reset the model, making it more effective at handling multi-domain data. Our method demonstrates superior performance on various noisy ASR datasets, outperform
Authors
(none)
Tags
Stats
Related papers
- Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation For Automatic Speech Recognition (2022)8.09
- Examining Test-time Adaptation For Personalized Child Speech Recognition (2024)0.00
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- Advancing Test-time Adaptation In Wild Acoustic Test Settings (2023)2.26
- SUTA-LM: Bridging Test-time Adaptation And Language Model Rescoring For Robust ASR (2025)0.00
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- A Simple Baseline For Domain Adaptation In End To End ASR Systems Using Synthetic Data (2022)7.16
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16