Survey Of Visual-semantic Embedding Methods For Zero-shot Image Retrieval
2021 Β· Kazuya Ueki
Abstract
Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on zero-shot image retrieval using sentences as queries and present a survey of the technological trends in this area. First, we provide a comprehensive overview of the history of the technology, starting with a discussion of the early studies of image-to-text matching and how the technology has evolved over time. In addition, a description of the datasets commonly used in experiments and a comparison of the evaluation results of each method are presented. We also introduce the implementation available on github for use in confirming the accuracy of experiments and for further improvement. We hope that this survey paper will encourage researchers to further develop their research on bridging images and languages.
Authors
(none)
Tags
Stats
Related papers
- Fine-grained Zero-shot Composed Image Retrieval With Complementary Visual-semantic Integration (2026)1.24
- SERVAL: Surprisingly Effective Zero-shot Visual Document Retrieval Powered By Large Vision And Language Models (2025)0.00
- Towards Zero-shot Cross-lingual Image Retrieval (2020)2.46
- Multiple Visual-semantic Embedding For Video Retrieval From Query Sentence (2020)2.26
- SETR: A Two-stage Semantic-enhanced Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Isearle: Improving Textual Inversion For Zero-shot Composed Image Retrieval (2024)12.09
- Zero-shot Retrieval For Scalable Visual Search In A Two-sided Marketplace (2025)1.57
- Visual Space Optimization For Zero-shot Learning (2019)0.00