Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions Of Visual-audio Content
2024 Β· Sheng Wu, Xiaobao Wang, Longbiao Wang, et al.
Abstract
Multimodal Sentiment Analysis (MSA) stands as a critical research frontier, seeking to comprehensively unravel human emotions by amalgamating text, audio, and visual data. Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge, particularly when emotional polarities across various segments appear similar. In this paper, our objective is to spotlight emotion-relevant attributes of audio and visual modalities to facilitate multimodal fusion in the context of nuanced emotional shifts in visual-audio scenarios. To this end, we introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions aimed at accentuating emotional features of visual-audio content. DEVA employs an Emotional Description Generator (EDG) to transmute raw audio and visual data into textualized sentiment descriptions, thereby amplifying their emotional characteristics. These descriptions are then integrated with the source data to yield richer, enh
Authors
(none)
Tags
Stats
Related papers
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00
- DLF: Disentangled-language-focused Multimodal Sentiment Analysis (2024)4.26
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52
- Enhancing Multimodal Sentiment Analysis For Missing Modality Through Self-distillation And Unified Modality Cross-attention (2024)6.71
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- Av-emodialog: Chat With Audio-visual Users Leveraging Emotional Cues (2024)0.00
- PSA-MF: Personality-sentiment Aligned Multi-level Fusion For Multimodal Sentiment Analysis (2025)0.00
- Getting The Subtext Without The Text: Scalable Multimodal Sentiment Classification From Visual And Acoustic Modalities (2018)7.50