Single-branch Network For Multimodal Training
2023 Β· Muhammad Saad Saeed, Shah Nawaz, Muhammad Haris Khan, et al.
Abstract
With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed single-branch network on the challen
Authors
(none)
Tags
Stats
Related papers
- Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025)4.52
- Learning Unseen Modality Interaction (2023)0.00
- Deep Unified Multimodal Embeddings For Understanding Both Content And Users In Social Media Networks (2019)0.00
- Multimodal Prototypical Networks For Few-shot Learning (2020)14.73
- Multimodal Contrastive Training For Visual Representation Learning (2021)16.32
- Learning Deep Representation Of Multityped Objects And Tasks (2016)0.00
- Modality Curation: Building Universal Embeddings For Advanced Multimodal Information Retrieval (2025)0.00
- Everything At Once -- Multi-modal Fusion Transformer For Video Retrieval (2021)15.78