CMSBERT-CLR: Context-driven Modality Shifting BERT With Contrastive Learning For Linguistic, Visual, Acoustic Representations
2022 Β· Junghun Kim, Jihie Kim
Abstract
Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common
Authors
(none)
Tags
Stats
Related papers
- On The Use Of Modality-specific Large-scale Pre-trained Encoders For Multimodal Sentiment Analysis (2022)6.77
- CALM: Contrastive Aligned Audio-language Multirate And Multimodal Representations (2022)0.00
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74
- Cross-modal Contrastive Representation Learning For Audio-to-image Generation (2022)0.00
- Token-level Contrastive Learning With Modality-aware Prompting For Multimodal Intent Recognition (2023)14.17
- Contrastive Regularization For Multimodal Emotion Recognition Using Audio And Text (2022)0.00
- DLF: Disentangled-language-focused Multimodal Sentiment Analysis (2024)4.26
- CACARA: Cross-modal Alignment Leveraging A Text-centric Approach For Cost-effective Multimodal And Multilingual Learning (2025)0.00