Learning Unseen Modality Interaction
2023 Β· Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek
Abstract
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive for generalization to unseen modality combinations during inference. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved. This allows the information to be accumulated with a simple summation operation across available modalities. To reduce overfitting to less discriminative modality combinations during training, we further improve the model learning with pseudo-supervision indicating the reliability of a modality's prediction. We demonstrate that our approach is effective for diverse tasks and modalities by evaluating it for multimodal video classification, robot state regression,
Authors
(none)
Tags
Stats
Related papers
- Explaining And Mitigating The Modality Gap In Contrastive Multimodal Learning (2024)0.00
- Modality Curation: Building Universal Embeddings For Advanced Multimodal Information Retrieval (2025)0.00
- Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025)4.52
- Towards Uniformity And Alignment For Multimodal Representation Learning (2026)0.00
- Continual Learning In Cross-modal Retrieval (2021)9.41
- Multimodal Contrastive Training For Visual Representation Learning (2021)16.32
- Modal-aware Features For Multimodal Hashing (2019)0.00
- Using Multiple Instance Learning To Build Multimodal Representations (2022)4.52