Token-level Contrastive Learning With Modality-aware Prompting For Multimodal Intent Recognition
2023 Β· Qianrui Zhou, Hua Xu, Hao Li, et al.
Abstract
Multimodal intent recognition aims to leverage diverse modalities such as expressions, body movements and tone of speech to comprehend user's intent, constituting a critical task for understanding human language and behavior in real-world multimodal scenarios. Nevertheless, the majority of existing methods ignore potential correlations among different modalities and own limitations in effectively learning semantic features from nonverbal modalities. In this paper, we introduce a token-level contrastive learning method with modality-aware prompting (TCL-MAP) to address the above challenges. To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism. Based on the modality-aware prompt and ground truth labels, the proposed token-level contrastive learning framewo
Authors
(none)
Tags
Stats
Related papers
- CMSBERT-CLR: Context-driven Modality Shifting BERT With Contrastive Learning For Linguistic, Visual, Acoustic Representations (2022)4.52
- Contrastive Regularization For Multimodal Emotion Recognition Using Audio And Text (2022)0.00
- Mintrec: A New Dataset For Multimodal Intent Recognition (2022)17.08
- Enhancing Multimodal Sentiment Analysis For Missing Modality Through Self-distillation And Unified Modality Cross-attention (2024)6.71
- Clapspeech: Learning Prosody From Text Context With Contrastive Language-audio Pre-training (2023)0.00
- CALM: Contrastive Aligned Audio-language Multirate And Multimodal Representations (2022)0.00
- Chatbridge: Bridging Modalities With Large Language Model As A Language Catalyst (2023)0.00
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22