MMER: Multimodal Multi-task Learning For Speech Emotion Recognition
2022 Β· Sreyan Ghosh, Utkarsh Tyagi, S Ramaneswaran, et al.
Abstract
In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.
Authors
(none)
Tags
Stats
Related papers
- Enhancing Modal Fusion By Alignment And Label Matching For Multimodal Emotion Recognition (2024)6.34
- Cross-modal Fusion Techniques For Utterance-level Emotion Recognition From Text And Speech (2023)9.59
- Bemerc: Behavior-aware Mllm-based Framework For Multimodal Emotion Recognition In Conversation (2025)0.00
- LLM Supervised Pre-training For Multimodal Emotion Recognition In Conversations (2025)8.35
- Leveraging Label Potential For Enhanced Multimodal Emotion Recognition (2025)0.00
- MIAR: Modality Interaction And Alignment Representation Fuison For Multimodal Emotion (2026)0.00
- Quality-controlled Multimodal Emotion Recognition In Conversations With Identity-based Transfer Learning And MAMBA Fusion (2025)0.00
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)18.02