Multiple Consistency-guided Test-time Adaptation For Contrastive Audio-language Models With Unlabeled Audio
2024 Β· Gongyu Chen, Haomin Zhang, Chaofan Ding, et al.
Abstract
One fascinating aspect of pre-trained Audio-Language Models (ALMs) learning is their impressive zero-shot generalization capability and test-time adaptation (TTA) methods aiming to improve domain performance without annotations. However, previous test time adaptation (TTA) methods for ALMs in zero-shot classification tend to be stuck in incorrect model predictions. In order to further boost the performance, we propose multiple guidance on prompt learning without annotated labels. First, guidance of consistency on both context tokens and domain tokens of ALMs is set. Second, guidance of both consistency across multiple augmented views of each single test sample and contrastive learning across different test samples is set. Third, we propose a corresponding end-end learning framework for the proposed test-time adaptation method without annotated labels. We extensively evaluate our approach on 12 downstream tasks across domains, our proposed adaptation method leads to 4.41% (max 7.50%) av
Authors
(none)
Tags
Stats
Related papers
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition (2025)0.00
- LI-TTA: Language Informed Test-time Adaptation For Automatic Speech Recognition (2024)3.58
- Advancing Test-time Adaptation In Wild Acoustic Test Settings (2023)2.26
- Examining Test-time Adaptation For Personalized Child Speech Recognition (2024)0.00
- PAT: Parameter-free Audio-text Aligner To Boost Zero-shot Audio Classification (2024)0.00
- Continual Test-time Adaptation For End-to-end Speech Recognition On Noisy Speech (2024)4.52
- Consistencytta: Accelerating Diffusion-based Text-to-audio Generation With Consistency Distillation (2023)6.77