Quality-controlled Multimodal Emotion Recognition In Conversations With Identity-based Transfer Learning And MAMBA Fusion
2025 Β· Zanxu Wang, Homayoon Beigi
Abstract
This paper addresses data quality issues in multimodal emotion recognition in conversation (MERC) through systematic quality control and multi-stage transfer learning. We implement a quality control pipeline for MELD and IEMOCAP datasets that validates speaker identity, audio-text alignment, and face detection. We leverage transfer learning from speaker and face recognition, assuming that identity-discriminative embeddings capture not only stable acoustic and Facial traits but also person-specific patterns of emotional expression. We employ RecoMadeEasy(R) engines for extracting 512-dimensional speaker and face embeddings, fine-tune MPNet-v2 for emotion-aware text representations, and adapt these features through emotion-specific MLPs trained on unimodal datasets. MAMBA-based trimodal fusion achieves 64.8% accuracy on MELD and 74.3% on IEMOCAP. These results show that combining identity-based audio and visual embeddings with emotion-tuned text representations on a quality-controlled su
Authors
(none)
Tags
Stats
Related papers
- Bemerc: Behavior-aware Mllm-based Framework For Multimodal Emotion Recognition In Conversation (2025)0.00
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- Whose Emotion Matters? Speaking Activity Localisation Without Prior Knowledge (2022)7.74
- MIAR: Modality Interaction And Alignment Representation Fuison For Multimodal Emotion (2026)0.00
- Qieemo: Speech Is All You Need In The Emotion Recognition In Conversations (2025)0.00
- Enhancing Modal Fusion By Alignment And Label Matching For Multimodal Emotion Recognition (2024)6.34
- Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts (2025)0.00
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52