Enhancing Multimodal Sentiment Analysis For Missing Modality Through Self-distillation And Unified Modality Cross-attention
2024 Β· Yuzhe Weng, Haotian Wang, Tian Gao, et al.
Abstract
In multimodal sentiment analysis, collecting text data is often more challenging than video or audio due to higher annotation costs and inconsistent automatic speech recognition (ASR) quality. To address this challenge, our study has developed a robust model that effectively integrates multimodal sentiment information, even in the absence of text modality. Specifically, we have developed a Double-Flow Self-Distillation Framework, including Unified Modality Cross-Attention (UMCA) and Modality Imagination Autoencoder (MIA), which excels at processing both scenarios with complete modalities and those with missing text modality. In detail, when the text modality is missing, our framework uses the LLM-based model to simulate the text representation from the audio modality, while the MIA module supplements information from the other two modalities to make the simulated text representation similar to the real text representation. To further align the simulated and real representations, and to
Authors
(none)
Tags
Stats
Related papers
- DLF: Disentangled-language-focused Multimodal Sentiment Analysis (2024)4.26
- Video-based Cross-modal Auxiliary Network For Multimodal Sentiment Analysis (2022)11.76
- Getting The Subtext Without The Text: Scalable Multimodal Sentiment Classification From Visual And Acoustic Modalities (2018)7.50
- Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions Of Visual-audio Content (2024)10.48
- MIAR: Modality Interaction And Alignment Representation Fuison For Multimodal Emotion (2026)0.00
- Exploiting Modality-invariant Feature For Robust Multimodal Emotion Recognition With Missing Modalities (2022)3.16
- PSA-MF: Personality-sentiment Aligned Multi-level Fusion For Multimodal Sentiment Analysis (2025)0.00
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74