Metadata-enhanced Speech Emotion Recognition: Augmented Residual Integration And Co-attention In Two-stage Fine-tuning
2024 Β· Zixiang Wan, Ziyue Qiu, Yiyang Liu, et al.
Abstract
Speech Emotion Recognition (SER) involves analyzing vocal expressions to determine the emotional state of speakers, where the comprehensive and thorough utilization of audio information is paramount. Therefore, we propose a novel approach on self-supervised learning (SSL) models that employs all available auxiliary information -- specifically metadata -- to enhance performance. Through a two-stage fine-tuning method in multi-task learning, we introduce the Augmented Residual Integration (ARI) module, which enhances transformer layers in encoder of SSL models. The module efficiently preserves acoustic features across all different levels, thereby significantly improving the performance of metadata-related auxiliary tasks that require various levels of features. Moreover, the Co-attention module is incorporated due to its complementary nature with ARI, enabling the model to effectively utilize multidimensional information and contextual relationships from metadata-related auxiliary tasks
Authors
(none)
Tags
Stats
Related papers
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- End-to-end Integration Of Speech Emotion Recognition With Voice Activity Detection Using Self-supervised Learning Features (2024)0.00
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74
- Speech Emotion Recognition With Co-attention Based Multi-level Acoustic Information (2022)16.17
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74