Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts
2025 Β· Aref Farhadipour, Hossein Ranjbar, Masoumeh Chapariniya, et al.
Abstract
Emotion recognition and sentiment analysis are pivotal tasks in speech and language processing, particularly in real-world scenarios involving multi-party, conversational data. This paper presents a multimodal approach to tackle these challenges on a well-known dataset. We propose a system that integrates four key modalities/channels using pre-trained models: RoBERTa for text, Wav2Vec2 for speech, a proposed FacialNet for facial expressions, and a CNN+Transformer architecture trained from scratch for video analysis. Feature embeddings from each modality are concatenated to form a multimodal vector, which is then used to predict emotion and sentiment labels. The multimodal system demonstrates superior performance compared to unimodal approaches, achieving an accuracy of 66.36% for emotion recognition and 72.15% for sentiment analysis.
Authors
(none)
Tags
Stats
Related papers
- Multi-modal Emotion Recognition By Text, Speech And Video Using Pretrained Transformers (2024)0.00
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)18.02
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Cross-modal Fusion Techniques For Utterance-level Emotion Recognition From Text And Speech (2023)9.59
- Multimodal Speech Emotion Recognition And Ambiguity Resolution (2019)0.00
- Multimodal Emotion Recognition Using Transfer Learning From Speaker Recognition And Bert-based Models (2022)12.10
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Effmulti: Efficiently Modeling Complex Multimodal Interactions For Emotion Analysis (2022)0.00