SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models
2025 Β· Yuan-Kuei Wu, Yang Liu, Yiteng Huang, et al.
Abstract
Spoken Language Models (SLMs) are increasingly central to modern speech-driven applications, but performance degrades under acoustic shift - real-world noise, reverberation, and microphone variation. Prior solutions rely on offline domain adaptation, which is post-hoc, data-intensive, and slow. We introduce the first test-time adaptation (TTA) framework for generative SLMs that process interleaved audio-text prompts. Our method updates a small, targeted subset of parameters during inference using only the incoming utterance, requiring no source data or labels. This stabilizes token distributions and improves robustness to acoustic variability without degrading core task accuracy. Evaluated on automatic speech recognition, speech translation, and 19 audio understanding tasks from AIR-Bench, our approach yields consistent gains under diverse corruptions. Because adaptation touches only a small fraction of weights, it is both compute- and memory-efficient, supporting deployment on resourc
Authors
(none)
Tags
Stats
Related papers
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- Examining Test-time Adaptation For Personalized Child Speech Recognition (2024)0.00
- SUTA-LM: Bridging Test-time Adaptation And Language Model Rescoring For Robust ASR (2025)0.00
- Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation For Automatic Speech Recognition (2022)8.09
- Continual Test-time Adaptation For End-to-end Speech Recognition On Noisy Speech (2024)4.52
- Multiple Consistency-guided Test-time Adaptation For Contrastive Audio-language Models With Unlabeled Audio (2024)2.26
- EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition (2025)0.00
- E-BATS: Efficient Backpropagation-free Test-time Adaptation For Speech Foundation Models (2025)0.00