Active Learning With Task Adaptation Pre-training For Speech Emotion Recognition
2024 Β· Dongyuan Li, Ying Zhang, Yusong Wang, et al.
Abstract
Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc\{After\}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are
Authors
(none)
Tags
Stats
Related papers
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition (2025)0.00
- Exploring Wav2vec 2.0 Fine-tuning For Improved Speech Emotion Recognition (2021)15.67
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58
- Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, And Augmenting (2023)0.00
- Towards Adversarial Learning Of Speaker-invariant Representation For Speech Emotion Recognition (2019)0.00
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74