EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition
2025 Β· Jiacheng Shi, Hongfei Du, Y. Alicia Hong, et al.
Abstract
Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance degradation in out-of-domain scenarios. Test-time adaptation (TTA) provides a promising solution but often relies on gradient-based updates or prompt tuning, limiting flexibility and practicality. We propose Emo-TTA, a lightweight, training-free adaptation framework that incrementally updates class-conditional statistics via an Expectation-Maximization procedure for explicit test-time distribution estimation, using ALM predictions as priors. Emo-TTA operates on individual test samples without modifying model weights. Experiments on six out-of-domain SER benchmarks show consistent accuracy improvements over prior TTA baselines, demonstrating the effectiveness of statistical adaptation in aligning model predictions with evolving test distributions.
Authors
(none)
Tags
Stats
Related papers
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- Active Learning With Task Adaptation Pre-training For Speech Emotion Recognition (2024)5.84
- Multiple Consistency-guided Test-time Adaptation For Contrastive Audio-language Models With Unlabeled Audio (2024)2.26
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- Examining Test-time Adaptation For Personalized Child Speech Recognition (2024)0.00
- SGEM: Test-time Adaptation For Automatic Speech Recognition Via Sequential-level Generalized Entropy Minimization (2023)6.77
- Advancing Test-time Adaptation In Wild Acoustic Test Settings (2023)2.26