Video-based Cross-modal Auxiliary Network For Multimodal Sentiment Analysis
2022 Β· Rongfei Chen, Wenju Zhou, Yang Li, et al.
Abstract
Multimodal sentiment analysis has a wide range of applications due to its information complementarity in multimodal interactions. Previous works focus more on investigating efficient joint representations, but they rarely consider the insufficient unimodal features extraction and data redundancy of multimodal fusion. In this paper, a Video-based Cross-modal Auxiliary Network (VCAN) is proposed, which is comprised of an audio features map module and a cross-modal selection module. The first module is designed to substantially increase feature diversity in audio feature extraction, aiming to improve classification accuracy by providing more comprehensive acoustic representations. To empower the model to handle redundant visual features, the second module is addressed to efficiently filter the redundant visual frames during integrating audiovisual data. Moreover, a classifier group consisting of several image classification networks is introduced to predict sentiment polarities and emotio
Authors
(none)
Tags
Stats
Related papers
- Getting The Subtext Without The Text: Scalable Multimodal Sentiment Classification From Visual And Acoustic Modalities (2018)7.50
- Multimodal Speech Emotion Recognition Using Cross Attention With Aligned Audio And Text (2022)9.76
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52
- A Joint Cross-attention Model For Audio-visual Fusion In Dimensional Emotion Recognition (2022)18.00
- Enhancing Multimodal Sentiment Analysis For Missing Modality Through Self-distillation And Unified Modality Cross-attention (2024)6.71
- Multimodal Fusion With Deep Neural Networks For Audio-video Emotion Recognition (2019)0.00
- Continuous Multimodal Emotion Recognition Approach For AVEC 2017 (2017)0.00
- Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts (2025)0.00