Bemerc: Behavior-aware Mllm-based Framework For Multimodal Emotion Recognition In Conversation
2025 Β· Yumeng Fu, Junjie Wu, Zhongjie Wang, et al.
Abstract
Multimodal emotion recognition in conversation (MERC), the task of identifying the emotion label for each utterance in a conversation, is vital for developing empathetic machines. Current MLLM-based MERC studies focus mainly on capturing the speaker's textual or vocal characteristics, but ignore the significance of video-derived behavior information. Different from text and audio inputs, learning videos with rich facial expression, body language and posture, provides emotion trigger signals to the models for more accurate emotion predictions. In this paper, we propose a novel behavior-aware MLLM-based framework (BeMERC) to incorporate speaker's behaviors, including subtle facial micro-expression, body language and posture, into a vanilla MLLM-based MERC model, thereby facilitating the modeling of emotional dynamics during a conversation. Furthermore, BeMERC adopts a two-stage instruction tuning strategy to extend the model to the conversations scenario for end-to-end training of a MERC
Authors
(none)
Tags
Stats
Related papers
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- Quality-controlled Multimodal Emotion Recognition In Conversations With Identity-based Transfer Learning And MAMBA Fusion (2025)0.00
- Gatedxlstm: A Multimodal Affective Computing Approach For Emotion Recognition In Conversations (2025)0.00
- LLM Supervised Pre-training For Multimodal Emotion Recognition In Conversations (2025)8.35
- A Comprehensive Survey On Multi-modal Conversational Emotion Recognition With Deep Learning (2023)0.00
- Leveraging Label Potential For Enhanced Multimodal Emotion Recognition (2025)0.00
- Whose Emotion Matters? Speaking Activity Localisation Without Prior Knowledge (2022)7.74
- Dynamic Graph Neural ODE Network For Multi-modal Emotion Recognition In Conversation (2024)0.00