Active Learning Based Fine-tuning Framework For Speech Emotion Recognition
2023 Β· Dongyuan Li, Yusong Wang, Kotaro Funakoshi, et al.
Abstract
Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20%pt. samples improves 8.45%pt. accuracy and reduces
Authors
(none)
Tags
Stats
Related papers
- Active Learning With Task Adaptation Pre-training For Speech Emotion Recognition (2024)5.84
- Exploring Wav2vec 2.0 Fine-tuning For Improved Speech Emotion Recognition (2021)15.67
- EMO-TTA: Improving Test-time Adaptation Of Audio-language Models For Speech Emotion Recognition (2025)0.00
- Trustser: On The Trustworthiness Of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition (2023)9.07
- Two-stage Framework For Robust Speech Emotion Recognition Using Target Speaker Extraction In Human Speech Noise Conditions (2024)3.58
- Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, And Augmenting (2023)0.00
- Metadata-enhanced Speech Emotion Recognition: Augmented Residual Integration And Co-attention In Two-stage Fine-tuning (2024)5.24
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58