Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition
2025 Β· Ruoyu Zhao, Xiantao Jiang, F. Richard Yu, et al.
Abstract
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a cross-attention transformer (CAT) mechanism during feature extraction. Transfer learning is applied to gain from a source emotional speech dataset to the target corpus for emotion recognition. We use IEMOCAP as the source dataset to train the source model and evaluate the proposed method on seven datasets in five languages (e.g., English, German, Spanish, Italian, and Chinese). We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75% across the seven datasets, with notable performance of 88.69% on EMO
Authors
(none)
Tags
Stats
Related papers
- Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-attention Cues In Multitask Learning (2024)0.00
- Cross-language Speech Emotion Recognition Using Multimodal Dual Attention Transformers (2023)0.00
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Semi-supervised Cross-lingual Speech Emotion Recognition (2022)10.85
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00
- GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning For Speech Emotion Recognition (2024)0.00
- Ctl-mtnet: A Novel Capsnet And Transfer Learning-based Mixed Task Net For The Single-corpus And Cross-corpus Speech Emotion Recognition (2022)10.21
- Distilled Hubert For Mobile Speech Emotion Recognition: A Cross-corpus Validation Study (2025)0.00