Composed Multi-modal Retrieval: A Survey Of Approaches And Applications
2025 Β· Kun Zhang, Jingyu Li, Zhe Li, et al.
Abstract
The burgeoning volume of multi-modal data necessitates advanced retrieval paradigms beyond unimodal and cross-modal approaches. Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology, enabling users to query images or videos by integrating a reference visual input with textual modifications, thereby achieving unprecedented flexibility and precision. This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications. CMR is categorized into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data construction, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and linear integration in zero-shot CMR, and semi-supervised CMR that leverages generated pseudo-triplets while addressing data noise/uncertainty. Additionally, we extensively survey the diverse application landscape of CMR, highlighting
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval: A Systematic Review Of Methods And Future Directions (2023)12.81
- Semcore: A Semantic-enhanced Generative Cross-modal Retrieval Framework With Mllms (2025)0.00
- Clamr: Contextualized Late-interaction For Multimodal Content Retrieval (2025)0.00
- A Comprehensive Empirical Study Of Vision-language Pre-trained Model For Supervised Cross-modal Retrieval (2022)0.00
- Beyond Global Similarity: Towards Fine-grained, Multi-condition Multimodal Retrieval (2026)2.20
- Generalized Contrastive Learning For Universal Multimodal Retrieval (2025)0.00
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- XR: Cross-modal Agents For Composed Image Retrieval (2026)0.00