On The Use Of Modality-specific Large-scale Pre-trained Encoders For Multimodal Sentiment Analysis
2022 Β· Atsushi Ando, Ryo Masumura, Akihiko Takashima, et al.
Abstract
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at htt
Authors
(none)
Tags
Stats
Related papers
- DLF: Disentangled-language-focused Multimodal Sentiment Analysis (2024)4.26
- Advancing Audio Emotion And Intent Recognition With Large Pre-trained Models And Bayesian Inference (2023)5.24
- CMSBERT-CLR: Context-driven Modality Shifting BERT With Contrastive Learning For Linguistic, Visual, Acoustic Representations (2022)4.52
- PSA-MF: Personality-sentiment Aligned Multi-level Fusion For Multimodal Sentiment Analysis (2025)0.00
- Enhancing Multimodal Sentiment Analysis For Missing Modality Through Self-distillation And Unified Modality Cross-attention (2024)6.71
- Getting The Subtext Without The Text: Scalable Multimodal Sentiment Classification From Visual And Acoustic Modalities (2018)7.50
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74
- Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts (2025)0.00